- Document ID
This guide is for existing and potential Open Data publishers, in particular Government Departments and public bodies who wish to publish Open Data on the data.gov.ie portal. In the following sections, we look at some of the stages of publishing Open Data: reviewing what data the organisation manages, identifying what data should be published as Open Data, ensuring the data is compliant with the recommendations in the Technical Framework, and publishing data as Open Data on data.gov.ie.
If you have any questions about publishing Open Data, please contact the Department of Public Expenditure and Reform at email@example.com
What data do we manage?
When considering the publication of Open Data, it is important for a public body to develop a clear understanding of all the data it holds.
A data audit provides a mechanism to discover what datasets an organisation holds. This enables improved knowledge management, data sharing and evidence-based decision-making. It also helps identify data that is unnecessary and utilising resources, or data that could be improved.
The aim of a data audit is to identify:
- the extent and range of datasets that exist within an organisation,
- whether these datasets are maintained,
- the ranking of datasets in terms of their importance to the delivery the organisation’s objectives,
- the perceived gaps in useful data that might help the delivery of these objectives,
- the potential for sharing datasets within the organisation and the wider public sector, and
- the potential for publication and making available for re-use.
A Data Audit Template is available from the Department of Public Expenditure and Reform to support public-bodies carrying out data audits. A simple data audit method is:
- Identify the organisation’s activities and information systems that collect, create or manage data.
- Review each of the data sources defined in step 1 and identify specific datasets. A dataset is a collection of data, published or curated by a single agent.
- Following the identification of datasets in step 2, describe each dataset using the metadata schema recommended in the Technical Framework
What data should we publish?
Once an organisation has an overview of the data it manages, the next consideration is what data should they publish as Open Data? According to the Foundation Document for the development of the Public Service Open Data Strategy, all appropriate data should be published as Open Data.
Open Data is considered the default option for appropriate new datasets. Where requested datasets are not released as Open Data, the responsible public body will provide reasons why not.
Therefore, a better question would be what data to publish first, with a view to continuous publication of all data. Open Data that is most useful is high-value data. High-value data can be defined as data that increases accountability, improves public knowledge, furthers the mission of the public body, or creates economic opportunity.
There are many ways an organisation can identify high-value datasets to prioritise for Open Data publication. These include considering:
- datasets that are of high-value internally, i.e. they are used frequently, or by multiple business units,
- direct dataset requests that have been received from the public, e.g. through parliamentary questions or freedom of information requests,
- datasets that are included in, or are sources for, reports, surveys or other public documents that have been published by the organisation, and
- datasets that are sources for Key Performance Indicators used within the organisation.
It is important to note that some data will never be appropriate to publish as Open Data, for example, data publication that would lead to a violation of the fundemental right to privacy under data protection legislation, or data that may be classified for security reasons. The Central Statistics Office can also provide support on the statistical anonymisation of data for publication purposes.
Is the data compliant with the Technical Framework?
The Open Data Technical Framework provides guidance on the practical aspects of publishing Open Data. This ensures that publication of datasets on data.gov.ie is done in a consistent, persistent and truly open way. The Technical Framework comprises five key components:
- Open Data Licence
- Data and metadata published on data.gov.ie must be associated with the Creative Commons Attribution (CC-BY) Licence, at a minimum.
- Recommended Formats for Open Data
- Data published on data.gov.ie must be machine-readable and in an open format (3-star Open Data), e.g. CSV, JSON or XML.
- Recommended Metadata Schema for Open Data
- Data published on data.gov.ie must be compliant with DCAT-AP, the international Open Data metadata standard.
- Recommended Standards for Open Data
- Data published on data.gov.ie should use national and international data-standards where possible,
- Unique Resource Identifiers
- Data published on data.gov.ie should use Unique Resource Identifiers where possible.
How to publish data on data.gov.ie?
The data is now ready to be published as Open Data on data.gov.ie. Publication on an Open Data Portal opens the door to innovative data-reuse. Data.gov.ie does not host the actual datasets, but instead is a catalogue of metadata, with pointers to the data hosted elsewhere on the Web, for example, on the website of a public organisation.
Data can be manually published on data.gov.ie via the 'Add Dataset' online form. Alternatively, data can be programmatically published on data.gov.ie via the API or a data harvester.
Add data manually via the data.gov.ie website
In order to add a dataset to data.gov.ie, the organisation requires a user account. An account can be created for Public Sector Bodies by contacting firstname.lastname@example.org
Once logged-in, the organisation can access the New Dataset page, as shown in Figure . By stepping through the online form, the organisation is promted to enter all the necessary metadata. The user-friendly interface makes it easy for non-technical users to publish datasets one at a time. However, using the web interface is not an efficient way to publish multiple data sets, or to periodically update existing datasets. For this, programmatic publication via the API should be used.
Add data programmatically using the API
Data.gov.ie is built using CKAN, which provides a powerful API that allows developers to add datasets programmatically. Using the API to create or update datasets is quicker than using the web interface when dealing with multiple datasets or dynamic datasets. More details on how to use the API to push data to the portal are available here
Automate the publication of data using a harvester
Building a harvester that fetches data automatically into data.gov.ie makes sense if a lot of data is sourced from one place, for example, another data catalogue, or if data needs to be updated frequently, for example, daily. A harvester pulls the data from a predefined source periodically, adding and updating it automatically in data.gov.ie. It may utilise the portal's API or be built as a custom CKAN extension.
Data.gov.ie currently harvests data from: