Formats
3 September 2024Tá leagan Gaeilge den mhír seo ar fáil anseo.
While data published in any format can be considered Open Data if associated with an Open Licence, the type of data format used can have significant implications for the usability of the data.
One way to measure the openness of the formats used is through the 5-star deployment scheme for Open Data. The greater the number of stars, the more reusable the data.
Under the Open Data Initiative, public bodies are asked to publish their data in the most open way possible and at a minimum 3 Star such as CSV, JSON or XML. Public bodies are also encouraged to publish datasets in multiple formats, for example, 1 Star (e.g. PDF), 2 Star (e.g. Microsoft Excel) in addition to the required 3 Star (e.g. CSV). 4 Star data means that the data uses Uniform Resource Indicators (URIs) to denote things and 5 Star data means that you link to other people's data. An example of a 4 and 5 Star data format is RDF (i.e. uses the Resource Description Framework. An example of 5 star linked data on the portal can be seen here.
An explanation of the different formats and how they can be used including links to some open source tools can be seen below.
(Click to expand drop down ON THE FORMAT NAMES).
Openness Rating - 3 Star
CSV (comma separated values) files are used to store tabular data in plain text format. Most often the fields in this data are separated by commas but other delimiters can be used such as |. TSV (tab separated values) files are similar but breaks are delimited by tabs. Both formats are widely supported and are often used to exchange data across multiple different computers and systems that support the format.
Most modern spreadsheet packages can open CSV/TSV files for viewing. To maintain formatting data you will want to save in a proprietary format like XLS (Microsoft Office Excel), ODS (Open Office spreadsheets) or .numbers (Apple Mac), depending on the software you use. A free tool for viewing csv and other spreadsheet formats online can be found here.
Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The design goals of XML emphasize simplicity, generality, and usability across the Internet. Although the design of XML focuses on documents, the language is widely used for the representation of arbitrary data structures such as those used in web services.
XML is generally more useful to developers and software systems. You can import them as tables into Excel or view using this free online tool.
Json (javascript object notation) files are human-readable text files used to transport data in key/value pairs. It is a format widely used on the web and is often the type of data returned from an API call.
Whilst human readable, json is generally used by software as a data source. A useful free tool to view json in interesting ways can be found here.
The shapefile format is a popular geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a (mostly) open specification for data interoperability among Esri and other GIS software products.
The creators of the .shp format have a desktop tool for download than can open these files.
A .px file (PC-Axis) is used to store "cubic" data. You can think of this as multiple tables within a single file, packaged with relevant metadata.
These files can be opened with standard spreadsheet software but are more useful to dedicated desktop applications.
Keyhole Markup Language files are used for expressing geographic annotation and visualisation within Internet-based, two-dimensional maps and three-dimensional Earth browsers. KML was developed for use with Google Earth..
You can drag and drop a KML file into this tool to view a map.
The JSON-stat format is a simple lightweight JSON format for data dissemination. It is based in a cube model that arises from the evidence that the most common form of data dissemination is the tabular form. In this cube model, datasets are organised in dimensions. Dimensions are organised in categories.
json-stat is generally used to generate views into tables. There is a web tool for this format here.
Openness Rating - 2 Star
A file with the XLS file extension is a Microsoft Excel 97-2003 Worksheet file. Later versions of Excel save spreadsheet files in the XLSX format by default. XLS files store data in tables of rows and columns with support for formatted text, images, charts, and more.
Open in Microsoft Excel. From there you can create charts, pivot tables and other formatting or visualisation tools. A free tool for viewing csv and other spreadsheet formats online can be found here.
Openness Rating - 1 Star
The Portable Document Format (commonly referred to as PDF) is a file format used to present documents in a manner independent of application software, hardware, and operating systems. Each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, graphics, and other information needed to display it.
Open in any PDF reading software, for example Adobe Acrobat Reader (get it free here).
Hypertext Markup Language (HTML) is the standard markup language for creating web pages and web applications. With Cascading Style Sheets (CSS) and JavaScript it forms a triad of cornerstone technologies for the World Wide Web. Web browsers receive HTML documents from a webserver or from local storage and render them into multimedia web pages. HTML describes the structure of a web page semantically.
Web pages are viewed in your browser. On Data.gov.ie, resources with an HTML extension will commonly lead to a page where you can view and manipulate data on the publisher's own site.
ZIP is an archive file format that supports lossless data compression. A .ZIP file may contain one or more files or directories that may have been compressed. The .ZIP file format permits a number of compression algorithms, though DEFLATE is the most common. The .ZIP format is now supported by many software utilities. Microsoft has included built-in .ZIP support. Apple has included built-in .ZIP support in Mac OS X 10.3
If you have downloaded a ZIP file from data.gov.ie then the useful data you want will be inside it. Simply extract and open the contents in the appropriate software.