Open Data Technical Framework

Developed in collaboration with the Public Bodies Working Group on Open Data

Document ID
http://data.gov.ie/techncial-framework
Published

Document Status

The Open Data Technical Framework was finalised after a public consultation.

License
Creative Commons Attribution 4.0 International (CC BY 4.0)

Acknowledgement

This document was prepared by the Public Bodies Working Group (PBWG) on Open Data. The Department of Public Expenditure and Reform wishes to acknowledge the important contribution of and to thank the PBWG members for their commitment and hard work over the last 6 months in developing this framework which underpins the publication of datasets on the portal http://data.gov.ie in line with agreed standards thus facilitating re-use and interoperability

  • Eoin McCuirc, Central Statistics Office
  • Dominic Byrne, Fingal County Council
  • Gareth John, Department of Arts, Heritage and the Gaeltacht
  • Liam Stewart, Office of Public Works
  • Niall Hayden, National Roads Authority
  • John Nott, National Transport Authority
  • Sandra Collins, Digital Repository of Ireland
  • Rebecca Grant, Digital Repository of Ireland
  • Tracey Lauriault, NUI Maynooth
  • Pat Mulhall, Office of the Revenue Commissioners
  • Keith Walsh, Office of the Revenue Commissioners
  • Eoin O’Grady, Marine Institute
  • Rob Ovington, Department of Environment, Community and Local Government
  • Brian Costello, Central Statistics Office
  • Adam Leadbetter, Marine Institute
  • Hugh Mangan, Ordnance Survey Ireland
  • Ken Noble, Ordnance Survey Ireland
  • Martin Troy, Department of Public Expenditure and Reform
  • Evelyn O’Connor, Department of Public Expenditure and Reform (Chair)
  • With technical support and advice from Deirdre Lee, Derilinx

Background

Open Data is recognised as a key element of the Public Service Reform agenda and improved data management is an important element of a wide variety of key policy documents and action plans. Ireland has also committed to meeting the challenges set under the G8 Open Data Charter.

Alignment with Reform Initiatives
Alignment of Open Data with Other Reform Initiatives

Need for an Open Data Technical Framework

A key priority under the Minister for Public Expenditure and Reform, Mr. Brendan Howlin T.D.’s Open Data Initiative is the development and expansion of the National Open Data Portal, http://data.gov.ie. The objective of the Open Data Portal is to publish government data in a way that will make it more discoverable, accessible, interoperable and reusable. The Portal has been updated to support the publication of high-value datasets to meet demand and contribute to the achievement of real economic, social and democratic benefits for citizens, business and the Public Sector.

This document sets out the proposed technical framework that supports the ongoing implementation of the Open Data Initiative and ensures that publication of datasets on the Open Data Portal, data.gov.ie, is done in a consistent, persistent and truly open way. This is a living document that will be expanded upon as technologies and practices evolve.

This Technical Framework comprises five key components:

  1. Open Data Licence
  2. Recommended Formats for Open Data
  3. Recommended Metadata Schema for Open Data
  4. Recommended Standards for Open Data
  5. Unique Resource identifiers

Publishing Open Data

The Open Data Technical Framework sets out a planned and structured approach to the publication of datasets as Open Data. Public Bodies, when considering publication of Open Data should take into account the value, potential for re-use, and contribution datasets can make to delivering better outcomes for citizens, business, and other public servants and to help improve evidence-based decision making by public bodies.

Decisions on publication of Open Data will ultimately be a matter for individual public bodies, following Data Audits which will be conducted in all public bodies over time.

Data Audits are important in that they form the basis for a planned and structured approach to be taken to the publication of data as Open Data; taking into account the value, potential for re-use and contribution it can make to help achieve Public Service reform and national economic objectives. More generally, auditing of datasets should be seen as part of an organisation’s information management strategy.

The output of audits will facilitate publication of datasets on our national portal, http://data.gov.ie. Audits will also promote the effective management and use/sharing of information in public bodies and support the implementation of the Public Service ICT strategy. Audits will enable identification of:

  • The extent and range of datasets that exist and are managed and maintained by a public body
  • The ranking of datasets in terms of their importance to the delivery of Departmental objectives and the perceived gaps in useful data that might help the delivery of these objectives
  • The potential for sharing datasets within the Department and the wider public sector
  • The potential for publication and making available for re-use – obligations in this regard will increase once the amended PSI Directive has been transposed and it would also be expected that greater publication and access to open data would reduce reliance on access to information under FOI
  • To build on the recommendations of the 2012 IMF Board paper by promoting the placing of greater emphasis on the quality of reported data/information.

A high level decision process map setting out the key issues to be considered is shown in the figure below. An Open Data publication checklist is included at Annex 2.

Decision Process Map
Decision Process Map

Open Data Licence

For a dataset to be considered as Open Data, it must be published under an Open Licence. The European Commission, as part of its ongoing work in relation to the Revision of the PSI Directive, has issued guidelines on recommended licences and datasets. These guidelines encourage “the use of open licences, which should eventually become common practice across the Union”.

Following a public consultation on options for Ireland’s Open Data Licence, 14 responses were received. There was broad support for the use of the Creative Commons Attribution 4.0 International (CC BY 4.0) licence from respondents. This licence lets others distribute, remix, tweak, and build upon data, even commercially, as long as users credit the original publisher for the original creation. CC-BY 4.0 is recommended for maximum dissemination and use of licensed materials. The proposed licence statement and recommended disclaimer statements should be used under the Open Data Initiative.

Recommendations

All data and metadata linked to data.gov.ie will be associated with the Creative Commons Attribution (CC-BY) Licence, at a minimum. Public bodies may waive copyright and associate datasets with CC0, if that is considered appropriate. The licence should be clearly identified in the metadata.

Only datasets associated with the recommended Open Data Licence may be included on data.gov.ie. However, datasets clearly associated with another licence, such as the PSI Licence, may be linked to the Open Data portal provided a commitment is made to using the Open Standard licence within a clearly defined timeframe.

Licence Statement
Under the CC-BY Licence, users must acknowledge the source of the Information in their product or application by including or linking to this attribution statement: “Contains Irish Government Data licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence”.
Multiple Attributions
If using data from several Information Providers and listing multiple attributions is not practical in a product or application, users may include a URI or hyperlink to a resource that contains the required attribution statements.
Disclaimer
All data linked to the Open data portal is published “as is”. The Information is licensed 'as is' and the Information Provider and/or Licensor excludes all representations, warranties, obligations and liabilities in relation to the Information to the maximum extent permitted by law.
The Information Provider and/or Licensor are not liable for any errors or omissions in the Information and shall not be liable for any loss, injury or damage of any kind caused by its use. The Information Provider does not guarantee the continued supply of the Information.
Exemptions
This licence does not cover personal information, unless sufficiently anonymised and/or aggregated. Nor does it cover third party rights (including, but not limited to, patents, copyright, database rights or trademarks).

Unique Resource Identifiers

The Technical Framework sets out steps that will allow public bodies to achieve a minimum 3 Star Open Data. In the longer term, however, the intention is to progress to greater levels of linked data (4 and 5 Star).

The use of Universal Resource Identifiers (URIs) is an important element of this longer term approach to Open Data and the Public Bodies Working Group will be tasked with developing a URI Strategy and agreeing a URI pattern for use under the Open Data Initiative, using international experience and best practice.

The ongoing development of Open Data and the desire to increase its interoperability have led to an increased reliance on URIs as identifiers for a wide variety of concepts; everything from languages to buildings, public bodies to currencies. URIs are valuable in that they can help distinguish data resources and facilitate unique data identification, comparison and linking. URIs can be used to identify anything from places and people to things and concepts.

It is intended that the outcome of the work of the PBWG will be persistent and scalable URI patterns that will continue to be used even when public bodies change and applications using URIs expand.

Examples of URI patterns

International research indicates that the elements under consideration for inclusion in the National URI Pattern should include:

  1. {domain} element The {domain} component contains the Internet domain and, optionally, a path within that domain.
  2. {type} element {type} indicates which kind of URI is involved. This may be:
    • 'id' - identifier of an object (individual/instance) in a register.
    • 'doc' - documentation (metadata) on the object in the register.
    • 'def' - definition of a term in an ontology
  3. {concept} element {concept} gives the human reader an indication of the type of concept that is identified by the URI.
  4. {reference} element {reference} is the identifying name or code of the individual object.

Other possible elements include: {namespace} (For new URI sets placed under common governance), {sector} (Same categories that are included in data.gov.ie, and {language}

Annex 1: Open Data Glossary

This is a general Glossary of Open Data Terms and Acronyms, for use as a reference guide for the Open Data Initiative. This Glossary will be expanded and enhanced as required.

Open Data
Data broadly refers to information, rendered in a human- or computer-readable manner, which may be the subject of research or a raw product of research. A dataset may be considered Open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike. (Open Knowledge Foundation - opendefinition.org)
Open Government Data
Data which has been produced or gathered by public bodies during the course of business activities, and published under an Open Licence.
Data Protection
Data protection legislation protects privacy rights of individuals in relation to the processing of their personal data.
When published openly, datasets must not identify individuals.
Anonymisation and Aggregation
Anonymisation and aggregation can be used to ensure that datasets relating to human subjects comply with relevant data protection legislation before publication.
Anonymisation involves the redaction of information from a dataset where individuals could previously have been identified.
Aggregation involves the publication of a dataset in summary form to exclude personal information which would allow an individual to be identified.
Copyright is an area of Intellectual Property law which covers original creative works including literary, dramatic, musical and artistic works, film, sound recordings, broadcasts and the typographical arrangement of published editions, computer software and non-original databases, and performances. Copyright exists from the moment the work is created, and does not require any registration of the work. In the case of a dataset, copyright may belong to an employer, a government department, a funder or another party, depending on the contract surrounding the creation of the data.
Licence
Licensing allows copyright owners to permit approved use and reuse of their work, without relinquishing copyright fully. Licensing can permit both commercial and non-commercial reuse of a work, depending on the terms of the licence, and licences may last in perpetuity or for a specified period. The application of a licence does not mean that a copyright statement should not be applied to a work, and many licences such as Creative Commons suggest that the copyright holder is credited. Open Data is usually associated with an Open Licence such as CC-BY (Creative Commons Attribution Only) or a Publication Domain Dedication such as CC0.
CC0
CC0/ Public Domain Dedication or “No Rights Reserved” is not truly a Creative Commons licence, as it does not reserve any rights in a copyright work. Assigning a Public Domain Dedication to a work relinquishes all rights in it, and allows use and re-use of the work for any purpose, without credit to the original author. Essentially this dedication allows works to enter the public domain before the legal term of copyright protection has ended. A Public Domain Dedication cannot be revoked.
Attribution Licence
A licence requiring that the original source of the licensed material is cited (attributed).
CC-BY (Creative Commons Attribution)
This licence lets others distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation. This licence is recommended for maximum dissemination and reusability of licensed materials.
Machine-readable format
Machine-readable formats are those containing structured data which can be extracted and analysed in an automated way. Examples of machine readable formats include those with a tabular structure such as .xls and .csv, as well as formats such as XML and JSON which are more flexible.
Generally, popular human-readable formats such as Word documents, PDF and HTML include formatting and display information which means that they are not machine-readable.
Open formats
An open format is one where the specifications for the software are available to anyone, free of charge, so that anyone can use these specifications in their own software without any limitations on reuse imposed by intellectual property rights. Open formats include .csv and .xml.
Proprietary formats
A format is proprietary if it encodes data in so that a file is readable only by using the same type of software used to create the file. Proprietary software does not openly publish its specifications for reuse. Proprietary formats include .xls (created in Microsoft Excel) and .docx (created in Microsoft Word).
Application Programming Interface
An Application Program Interface (API) is a set of routines, protocols, and tools for building software applications. Programs that use a common API will have similar user interfaces, making it easier for users to learn new programs. (W3C eGov Glossary)

Commonly Used Acronyms

ASCII American Standard Code for Information Interchange

CC Creative Commons

CKAN Comprehensive Knowledge Archive Network

CSO Central Statistics Office of Ireland

CSV Comma Separated Value

DCAT Data Catalogue

DCAT-AP DCAT Application Profile

DCMI Dublin Core Metadata Initiative

EC European Commission

ETR European Terrestrial Reference

GeoJSON Geo JavaScript Object Notation

GML Geography Markup Language

GTFS General Transit Feed Specification

HIQA Health Information and Quality Authority

IATI International Aid Transparency Initiative

IETF Internet Engineering Task Force

IFC International Finance Corporation

INSPIRE Infrastructure for Spatial Information in the European Community

ISO International Organization for Standardization

ITM Irish Transverse Mercator

JSON JavaScript Object Notation

KML Keyhole Markup Language

LAS Log ASCII Standard

NetCDF Network Common Data Form

OASIS Organization for the Advancement of Structured Information Standards

ODF OpenDocument Format

OGC Open Geospatial Consortium

OSi Ordnance Survey Ireland

PBWG Public Bodies Working Group

PDF Portable Document Format

PSI Public Sector Information

RDF Resource Description Framework

SDMX Statistical Data and Metadata eXchange

URI Uniform Reference Identifier

W3C World Wide Web Consortium

WGS World Geodetic System

WHO World Health Organization

WKT Well Known Text

WFS Web Feature Service

WMS Web Map Service

XBRL eXtensible Business Reporting Language

XML Extensible Markup Language