Formats

While data published in any format can be considered Open Data if associated with an Open Licence, the type of data format used can have significant implications for the usability of the data.

One way to measure the openness of the formats used is through the 5-star deployment scheme for Open Data. The greater the number of stars, the more reusable the data.

Under the Open Data Initiative, public bodies are asked to publish their data in the most open way possible and at a minimum 3 Star such as CSV, JSON or XML. Public bodies are also encouraged to publish datasets in multiple formats, for example, 1 Star (e.g. PDF), 2 Star (e.g. Microsoft Excel) in addition to the required 3 Star (e.g. CSV). 4 Star data means that the data uses Uniform Resource Indicators (URIs) to denote things and 5 Star data means that you link to other people's data. An example of a 4 and 5 Star data format is RDF (i.e. uses the Resource Description Framework. An example of 5 star linked data on the portal can be seen here.

An explanation of the different formats and how they can be used including links to some open source tools can be seen below.

(click to expand explanatory text)

Openness Rating -


What is it?
CSV (comma separated values) files are used to store tabular data in plain text format. Most often the fields in this data are separated by commas but other delimiters can be used such as |. TSV (tab separated values) files are similar but breaks are delimited by tabs. Both formats are widely supported and are often used to exchange data across multiple different computers and systems that support the format.
What is it?

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The design goals of XML emphasize simplicity, generality, and usability across the Internet. Although the design of XML focuses on documents, the language is widely used for the representation of arbitrary data structures such as those used in web services.

What is it?
Json (javascript object notation) files are human-readable text files used to transport data in key/value pairs. It is a format widely used on the web and is often the type of data returned from an API call.
What is it?
The shapefile format is a popular geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a (mostly) open specification for data interoperability among Esri and other GIS software products.
What is it?
A .px file (PC-Axis) is used to store "cubic" data. You can think of this as multiple tables within a single file, packaged with relevant metadata.
What is it?
Keyhole Markup Language files are used for expressing geographic annotation and visualisation within Internet-based, two-dimensional maps and three-dimensional Earth browsers. KML was developed for use with Google Earth..
What is it?
The JSON-stat format is a simple lightweight JSON format for data dissemination. It is based in a cube model that arises from the evidence that the most common form of data dissemination is the tabular form. In this cube model, datasets are organised in dimensions. Dimensions are organised in categories.

Openness Rating -


What is it?
A file with the XLS file extension is a Microsoft Excel 97-2003 Worksheet file. Later versions of Excel save spreadsheet files in the XLSX format by default. XLS files store data in tables of rows and columns with support for formatted text, images, charts, and more.

Openness Rating -


What is it?
The Portable Document Format (commonly referred to as PDF) is a file format used to present documents in a manner independent of application software, hardware, and operating systems. Each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, graphics, and other information needed to display it.
What is it?
Hypertext Markup Language (HTML) is the standard markup language for creating web pages and web applications. With Cascading Style Sheets (CSS) and JavaScript it forms a triad of cornerstone technologies for the World Wide Web. Web browsers receive HTML documents from a webserver or from local storage and render them into multimedia web pages. HTML describes the structure of a web page semantically.
What is it?
ZIP is an archive file format that supports lossless data compression. A .ZIP file may contain one or more files or directories that may have been compressed. The .ZIP file format permits a number of compression algorithms, though DEFLATE is the most common. The .ZIP format is now supported by many software utilities. Microsoft has included built-in .ZIP support. Apple has included built-in .ZIP support in Mac OS X 10.3