Tá leagan Gaeilge den mhír seo ar fáil anseo.
Document ID
http://data.gov.ie/opendatatechnicalframework
Published
2015-06-01
The Open Data Technical Framework was finalised after a public consultation.
License
Creative Commons Attribution 4.0 International (CC BY 4.0)
This document was prepared by the Public Bodies Working Group (PBWG) on Open Data. The Department of Public Expenditure and Reform wishes to acknowledge the important contribution of and to thank the PBWG members for their commitment and hard work over the last 6 months in developing this framework which underpins the publication of datasets on the portal http://data.gov.ie in line with agreed standards thus facilitating re-use and interoperability
Open Data is recognised as a key element of the Public Service Reform agenda and improved data management is an important element of a wide variety of key policy documents and action plans. Ireland has also committed to meeting the challenges set under the G8 Open Data Charter.
Alignment of Open Data with Other Reform Initiatives
A key priority under the Minister for Public Expenditure and Reform, Mr. Brendan Howlin T.D.’s Open Data Initiative is the development and expansion of the National Open Data Portal, http://data.gov.ie. The objective of the Open Data Portal is to publish government data in a way that will make it more discoverable, accessible, interoperable and reusable. The Portal has been updated to support the publication of high-value datasets to meet demand and contribute to the achievement of real economic, social and democratic benefits for citizens, business and the Public Sector.
This document sets out the proposed technical framework that supports the ongoing implementation of the Open Data Initiative and ensures that publication of datasets on the Open Data Portal, data.gov.ie, is done in a consistent, persistent and truly open way. This is a living document that will be expanded upon as technologies and practices evolve.
This Technical Framework comprises five key components:
The Open Data Technical Framework sets out a planned and structured approach to the publication of datasets as Open Data. Public Bodies, when considering publication of Open Data should take into account the value, potential for re-use, and contribution datasets can make to delivering better outcomes for citizens, business, and other public servants and to help improve evidence-based decision making by public bodies.
Decisions on publication of Open Data will ultimately be a matter for individual public bodies, following Data Audits which will be conducted in all public bodies over time.
Data Audits are important in that they form the basis for a planned and structured approach to be taken to the publication of data as Open Data; taking into account the value, potential for re-use and contribution it can make to help achieve Public Service reform and national economic objectives. More generally, auditing of datasets should be seen as part of an organisation’s information management strategy.
The output of audits will facilitate publication of datasets on our national portal, http://data.gov.ie. Audits will also promote the effective management and use/sharing of information in public bodies and support the implementation of the Public Service ICT strategy. Audits will enable identification of:
A high level decision process map setting out the key issues to be considered is shown in the figure below. An Open Data publication checklist is included at Annex 2.
Decision Process Map
For a dataset to be considered as Open Data, it must be published under an Open Licence. The European Commission, as part of its ongoing work in relation to the Revision of the PSI Directive, has issued guidelines on recommended licences and datasets. These guidelines encourage “the use of open licences, which should eventually become common practice across the Union”.
Following a public consultation on options for Ireland’s Open Data Licence, 14 responses were received. There was broad support for the use of the Creative Commons Attribution 4.0 International (CC BY 4.0) licence from respondents. This licence lets others distribute, remix, tweak, and build upon data, even commercially, as long as users credit the original publisher for the original creation. CC-BY 4.0 is recommended for maximum dissemination and use of licensed materials. The proposed licence statement and recommended disclaimer statements should be used under the Open Data Initiative.
Recommendations
All data and metadata linked to data.gov.ie will be associated with the Creative Commons Attribution (CC-BY) Licence, at a minimum. Public bodies may waive copyright and associate datasets with CC0, if that is considered appropriate. The licence should be clearly identified in the metadata.
Only datasets associated with the recommended Open Data Licence may be included on data.gov.ie. However, datasets clearly associated with another licence, such as the PSI Licence, may be linked to the Open Data portal provided a commitment is made to using the Open Standard licence within a clearly defined timeframe.
Licence Statement
Under the CC-BY Licence, users must acknowledge the source of the Information in their product or application by including or linking to this attribution statement: “Contains Irish Government Data licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence”.
Multiple Attributions
If using data from several Information Providers and listing multiple attributions is not practical in a product or application, users may include a URI or hyperlink to a resource that contains the required attribution statements.
Disclaimer
All data linked to the Open data portal is published “as is”. The Information is licensed 'as is' and the Information Provider and/or Licensor excludes all representations, warranties, obligations and liabilities in relation to the Information to the maximum extent permitted by law.
The Information Provider and/or Licensor are not liable for any errors or omissions in the Information and shall not be liable for any loss, injury or damage of any kind caused by its use. The Information Provider does not guarantee the continued supply of the Information.
Exemptions
This licence does not cover personal information, unless sufficiently anonymised and/or aggregated. Nor does it cover third party rights (including, but not limited to, patents, copyright, database rights or trademarks).
While data published in any format can be considered Open Data if associated with an Open Licence, the type of data format used can have significant implications for the usability of the data. Under the Open Data Initiative, public bodies should publish their data in the most open way possible. One way to measure the openness of the formats used is through the 5-star deployment scheme for Open Data. The greater the number of stars, the more reusable the data.
5-Star Open Data Scheme (See http://5stardata.info/)
For inclusion in the Open Data Portal, public bodies must publish data at a minimum of 3 Star Open Data, such as CSV, JSON or XML. However it is encouraged to publish datasets in multiple formats, for example, 1 Star (e.g. PDF), 2 Star (e.g. Microsoft Excel) in addition to the required 3 Star (e.g. CSV). Further examples of formats are available here.
Recommendations
All datasets on http://data.gov.ie should be available in at least one of the following formats:
General | Geospatial | Domain-Specific |
CSV | GeoJSON | NetCDF |
JSON | GML | Datex II |
XML | KML | GTFS |
ODF | WKT | JSON STAT |
RDF | LAS | |
ODS | IFC | |
Shapefile | ||
ASCII Grid |
This list is subject to review and updating as new formats are developed due to technological developments.
If a public body intends to change a publication format, it should give prior notice to users stating the date from which the new format will be introduced. Sufficient time (three months suggested) should be given to allow users to make any necessary arrangements to ensure that they are not adversely affected by the change.
Datasets may be published in multiple formats.
Datasets not yet available in one of the recommended open formats should have a clear timeframe when it will be available in an open format.
The publication of data in open formats should be built into data publication processes of all public bodies, and as part of information management more generally.
In order to help realise the benefits of Open Data, public bodies should make their data more searchable and usable. To achieve this, public bodies should provide precise descriptors about their datasets to help in the identification, location and retrieval of online resources by data-users.
These descriptors are commonly known as “metadata”.
Metadata is the summary information describing the data, including the availability, nature and constituents of the data. It provides context about the data that helps users understand their meaning, such as:
The Open Data Initiative requires a consistent approach to the publication of Open Data to ensure interoperability between datasets published by public bodies, at both national and international levels.
Accordingly, this Technical Framework recommends the adoption of a standardised Metadata Schema by public bodies, namely the W3C Data Catalog Vocabulary (DCAT), and more specifically, the DCAT Application Profile for European Data Portals (DCAT-AP).DCAT-APis being used in a number of European Open Data portals. An extracted Reference Guide to DCAT-AP is available in the table below.
One aspect of DCAT-AP that is lacking is geospatial metadata coverage. The EU DCAT-AP Working Group has identified the need to describe geospatial datasets, data series, and services. As a result, that Group is working on GeoDCAT-AP, an extension of DCAT-AP. For the purpose of data.gov.ie, the geospatial metadata properties defined in the table below will be included.
Class | Class URI | Mandatory properties | Recommended properties | Optional properties |
Catalogue | dcat:Catalog | dcat:dataset dct:description dct:publisher dct:title | dct:issued dct:language dct:license dct:modified dcat:themeTaxonomy foaf:homepage | dcat:record dct:rights dct:spatial |
Dataset | dcat:Dataset | dct:description dct:title | adms:contactPoint dcat:distribution dcat:keyword dcat:theme dct:publisher | adms:identifier adms:version adms:versionNotes dcat:landingPage dct:accrualPeriodicity dct:conformsTo dct:identifier dct:issued dct:language dct:modified dct:spatial dct:temporal |
Distribution | dcat:Distribution | dcat:accessURL | dct:description dct:format dct:license | adms:status dcat:byteSize dcat:downloadURL dcat:mediaType dct:issued dct:modified dct:rights dct:title |
Table: DCAT-AP Quick Reference of Classes and Properties. Extracted from DCAT-AP Specificaiton Final v1.01 (Word Version)
Property | URI | Domain | Range | Usage note | Card. |
Geographic Bounding Box | gmd:EX_GeographicBoundingBox | Dcat:Dataset | gmd:EX_GeographicBoundingBox | http://www.datypic.com/sc/niem20/t-gmd_EX_GeographicBoundingBox_Type.html | 0..1 |
Temporal Extent | dct:temporal | Dcat:Dataset | dct:PeriodOfTime | This property refers to a temporal period that the Dataset covers. | 0..n |
Lineage | dct:provenance | Dcat:Dataset | dct:ProvenanceStatement | This property contains a statement about the lineage of a Dataset. | 0..n |
Spatial Reference System | gmd:MD_ReferenceSystem | Dcat:Dataset | gmd:MD_ReferenceSystem | http://www.datypic.com/sc/niem20/e-gmd_MD_ReferenceSystem.html (See table below) | 0..1 |
Spatial Resolution | gmd:MD_Resolution | Dcat:Dataset | gmd:MD_Resolution | http://www.datypic.com/sc/niem20/e-gmd_MD_Resolution.html | 0..1 |
Conformance | dct:conformsTo | Dcat:Dataset | dct:Standard | This property refers to an implementing rule or other specification. | 0..n |
Table: Geospatial metadata elements to be included in data.gov.ie
Name | Acronym | EPSG Code | URI |
Irish Transverse Mercator | ITM | 2157 | http://www.opengis.net/def/crs/EPSG/0/2157 |
Irish Grid | 29902 | http://www.opengis.net/def/crs/EPSG/0/29902 | |
European Terrestrial Reference System 1989 | ETR89 | 4258 | http://www.opengis.net/def/crs/EPSG/0/4258 |
Ireland 1975 Mapping Adjustment& | (1953/1956?) | ||
World Geodetic System 1984 | WGS-84 | 4326 | http://www.opengis.net/def/crs/EPSG/0/4326 |
Table: Spatial Reference Systems
Recommendations
All Open Data must be associated by standardised metadata.
All metadata must be accompanied by the Open Licence.
DCAT AP will be adopted as the Open Data Initiative’s Metadata Schema, with appropriate geospatial values outlined.
All datasets on http://data.gov.ie will be accompanied by metadata compliant to DCAT-AP (with the Geo extension, if appropriate to the dataset).
This Metadata Schema includes three categories of metadata, as set out in Table 3 above:
Data standards, also referred to as data models or data vocabularies, ensure a common understanding of data content and what it describes to data users; and help facilitate the smooth exchange of data. Standards ensure that data is published in a permanent, persistent and consistent way.
Data standards help give a common meaning to data. This is especially important when data is being used by a third-party, being integrated from different sources, or when data is being shared across public bodies. Data standards not only define the meaning of certain concepts, but also how concepts relate to each other, which facilitates data interoperability.
When publishing Open Data, international standards defined by reputable standards organisations, such as ISO, the European Commission, W3C, IETF, OGC and OASIS should be used if possible. If international standards are unavailable or unsuitable, use national standards. For specific topics such as geospatial, statistics, or health, use national standards as defined by the responsible organisation (OSI, CSO, HIQA, etc.).
The Public Bodies Working Group (PBWG) reviewed the commonly used data standards by Irish Public Bodies. These are defined in the table below. This is not an exhaustive list and is designed to be a go-to point for data publishers. The list will be updated with new standards as they are adopted in general practice.
Short Title | Title | Domain | Standardisation Body | URL |
---|---|---|---|---|
AR-DRG | Australian Refined Diagnosis Related Group | Health | Australian Government | http://www.aihw.gov.au/hospitals-data/ar-drg-data-cubes/ |
ATC/DDD | The Anatomical Therapeutic Chemical Classification System with Defined Daily Doses | Chemical | WHO | http://www.who.int/classifications/atcddd/en/ |
COICOP | Classification of Individual Consumption According to Purpose | Consumption | UN Statistics Division | http://unstats.un.org/unsd/cr/registry/regcst.asp?Cl=5 |
CSO Standard Classifications | CSO Standard Classifications | Multiple | CSO | http://www.cso.ie/en/surveysandmethodology/classifications/standardclassifications/ |
CSO Standards | CSO Standard Classifictions | Statistics | CSO | http://www.cso.ie/en/surveysandmethodology/classifications/standardclassifications/ |
DataCube | Data Cube Vocabulary | Statistical | W3C | http://www.w3.org/TR/vocab-data-cube/ |
DCAT | Data Catalog Vocabulary | Metadata | W3C | http://www.w3.org/TR/vocab-dcat/ |
DCMI | Dublin Core Metadata Initiative | Metadata | Dublin Core | http://dublincore.org/documents/dcmi-terms/ |
Disadvantage Index | Disadvantage index | ERC | ? | |
EUCAN | Common Cancers | Cancer | WHO | http://eco.iarc.fr/eucan/Default.aspx |
IANA | IANA Media Types | Media/File Types | Internet Assigned Numbers Authority | http://www.iana.org/assignments/media-types/media-types.xhtml |
IATI | International Aid Transparency Initiative | Transparency | IATI | http://iatistandard.org/ |
ICCS | Irish Crime Classification System | Crime | CSO | http://www.cso.ie/en/media/csoie/releasespublications/documents/crimejustice/current/crimeclassification.pdf |
ICD | International Classification of Diseases | Health | WHO | http://www.who.int/classifications/icd/en/ |
ISO 19100 | 19100 Geographic Information standard series developed by the International Organization for Standardization (ISO) | Geospatial | ISO/OGC | http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_tc_browse.htm?commid=54904&published=on&includesc=true |
INSPIRE | Infrastructure for Spatial Information in the European Community | Spatial / Environmental | EC | http://inspire.ec.europa.eu/ |
ISO 3166-2:IE | Country codes and subdivisions | ISO | http://www.iso.org/iso/iso_3166-2_newsletter_ii-3_2011-12-13.pdf | |
ISO 639 | Language codes | Language | ISO | http://www.iso.org/iso/home/standards/language_codes.htm |
ISO 8601 | Date and time format | Date/Time | ISO | http://www.iso.org/iso/home/standards/iso8601.htm |
ISO 4217 | Currency codes | Mulitple | ISO | http://www.iso.org/iso/home/standards/currency_codes.htm |
MDC | Major Diagnostic category | Health | Utah Department of Health | http://health.utah.gov/opha/IBIShelp/codes/MDC.htm |
NACE Rev.2 | NACE Rev.2 | Metadata | Eurostat | http://ec.europa.eu/eurostat/ramon/nomenclatures/index.cfm?TargetUrl=LST_NOM_DTL&StrNom=NACE_REV2&StrLanguageCode=EN&IntPcKey=&StrLayoutCode=HIERARCHIC&CFID=1110191&CFTOKEN=3ca0f6dadb71d377-1F2DE4F0-F7BF-BCAE-31C18C386EA88F92&jsessionid=f900daad75c14b465532m |
NUTS | Nomenclature of territorial units for statistics | EC | http://ec.europa.eu/eurostat/web/nuts/overview | |
SDMX | Statistical Data and Metadata eXchange | Statistical | SDMX | http://sdmx.org/ |
Total poverty index | Total poverty index | ERC | ? | |
XBRL | eXtensible Business Reporting Language | Business | XBRL | http://www.xbrl.org/ |
The 26 geographic counties, except that Tipperary NR and Tipperary SR are distinguished | CSO | ? | ||
The 26 geographical counties | CSO | ? | ||
The 32 geographic counties of Ireland and Northern Ireland | CSO | ? | ||
The 34 administrative counties, except that Tipperary NR and Tipperary SR are combined | CSO | ? | ||
The 34 administrative counties | CSO | ? |
Table: Recommended Data Standards for data.gov.ie
Recommendations
Use the table above as a reference of data standards commonly used for Open Data in Ireland.
When publishing Open Data, public bodies should first try to reuse international standards defined by reputable standards organisations, such as ISO, the European Commission, W3C, IETF, OGC and OASIS.
If international standards are unavailable or unsuitable, use national standards. For specific topics such as geospatial, statistics, or health, promote national standards defined by the responsible organisation (OSI, CSO, HIQA, etc.).
The Technical Framework sets out steps that will allow public bodies to achieve a minimum 3 Star Open Data. In the longer term, however, the intention is to progress to greater levels of linked data (4 and 5 Star).
The use of Universal Resource Identifiers (URIs) is an important element of this longer term approach to Open Data and the Public Bodies Working Group will be tasked with developing a URI Strategy and agreeing a URI pattern for use under the Open Data Initiative, using international experience and best practice.
The ongoing development of Open Data and the desire to increase its interoperability have led to an increased reliance on URIs as identifiers for a wide variety of concepts; everything from languages to buildings, public bodies to currencies. URIs are valuable in that they can help distinguish data resources and facilitate unique data identification, comparison and linking. URIs can be used to identify anything from places and people to things and concepts.
It is intended that the outcome of the work of the PBWG will be persistent and scalable URI patterns that will continue to be used even when public bodies change and applications using URIs expand.
Examples of URI patterns
Internarional research indicates that the elements under consideration for inclusion in the National URI Pattern should include:
Other possible elements include: {namespace} (For new URI sets placed under common governance), {sector} (Same categories that are included in data.gov.ie, and {language}
This is a general Glossary of Open Data Terms and Acronyms, for use as a reference guide for the Open Data Initiative. This Glossary will be expanded and enhanced as required.
Open Data
Data broadly refers to information, rendered in a human- or computer-readable manner, which may be the subject of research or a raw product of research. A dataset may be considered Open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike. (Open Knowledge Foundation - opendefinition.org)
Open Government Data
Data which has been produced or gathered by public bodies during the course of business activities, and published under an Open Licence.
Data Protection
Data protection legislation protects privacy rights of individuals in relation to the processing of their personal data.
When published openly, datasets must not identify individuals.
Anonymisation and Aggregation
Anonymisation and aggregation can be used to ensure that datasets relating to human subjects comply with relevant data protection legislation before publication.
Anonymisation involves the redaction of information from a dataset where individuals could previously have been identified.
Aggregation involves the publication of a dataset in summary form to exclude personal information which would allow an individual to be identified.
Guidance and information on anonymisation is available on the website of the Office of the Data Protection Commissioner here.
Copyright
Copyright is an area of Intellectual Property law which covers original creative works including literary, dramatic, musical and artistic works, film, sound recordings, broadcasts and the typographical arrangement of published editions, computer software and non-original databases, and performances. Copyright exists from the moment the work is created, and does not require any registration of the work. In the case of a dataset, copyright may belong to an employer, a government department, a funder or another party, depending on the contract surrounding the creation of the data.
Licence
Licensing allows copyright owners to permit approved use and reuse of their work, without relinquishing copyright fully. Licensing can permit both commercial and non-commercial reuse of a work, depending on the terms of the licence, and licences may last in perpetuity or for a specified period. The application of a licence does not mean that a copyright statement should not be applied to a work, and many licences such as Creative Commons suggest that the copyright holder is credited. Open Data is usually associated with an Open Licence such as CC-BY (Creative Commons Attribution Only) or a Publication Domain Dedication such as CC0.
CC0
CC0/ Public Domain Dedication or “No Rights Reserved” is not truly a Creative Commons licence, as it does not reserve any rights in a copyright work. Assigning a Public Domain Dedication to a work relinquishes all rights in it, and allows use and re-use of the work for any purpose, without credit to the original author. Essentially this dedication allows works to enter the public domain before the legal term of copyright protection has ended. A Public Domain Dedication cannot be revoked.
Attribution Licence
A licence requiring that the original source of the licensed material is cited (attributed).
CC-BY (Creative Commons Attribution)
This licence lets others distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation. This licence is recommended for maximum dissemination and reusability of licensed materials.
Machine-readable format
Machine-readable formats are those containing structured data which can be extracted and analysed in an automated way. Examples of machine readable formats include those with a tabular structure such as .xls and .csv, as well as formats such as XML and JSON which are more flexible.
Generally, popular human-readable formats such as Word documents, PDF and HTML include formatting and display information which means that they are not machine-readable.
Open formats
An open format is one where the specifications for the software are available to anyone, free of charge, so that anyone can use these specifications in their own software without any limitations on reuse imposed by intellectual property rights. Open formats include .csv and .xml.
Proprietary formats
A format is proprietary if it encodes data in so that a file is readable only by using the same type of software used to create the file. Proprietary software does not openly publish its specifications for reuse. Proprietary formats include .xls (created in Microsoft Excel) and .docx (created in Microsoft Word).
Application Programming Interface
An Application Program Interface (API) is a set of routines, protocols, and tools for building software applications. Programs that use a common API will have similar user interfaces, making it easier for users to learn new programs. (W3C eGov Glossary)
ASCII American Standard Code for Information Interchange
CC Creative Commons
CKAN Comprehensive Knowledge Archive Network
CSO Central Statistics Office of Ireland
CSV Comma Separated Value
DCAT Data Catalogue
DCAT-AP DCAT Application Profile
DCMI Dublin Core Metadata Initiative
EC European Commission
ETR European Terrestrial Reference
GeoJSON Geo JavaScript Object Notation
GML Geography Markup Language
GTFS General Transit Feed Specification
HIQA Health Information and Quality Authority
IATI International Aid Transparency Initiative
IETF Internet Engineering Task Force
IFC Industry Foundation Classes
INSPIRE Infrastructure for Spatial Information in the European Community
ISO International Organization for Standardization
ITM Irish Transverse Mercator
JSON JavaScript Object Notation
KML Keyhole Markup Language
LAS Log ASCII Standard
NetCDF Network Common Data Form
OASIS Organization for the Advancement of Structured Information Standards
ODF OpenDocument Format
OGC Open Geospatial Consortium
ODS Open Document Spreadsheet
OSi Ordnance Survey Ireland
PBWG Public Bodies Working Group
PDF Portable Document Format
PSI Public Sector Information
RDF Resource Description Framework
SDMX Statistical Data and Metadata eXchange
URI Uniform Reference Identifier
W3C World Wide Web Consortium
WGS World Geodetic System
WHO World Health Organization
WKT Well Known Text
WFS Web Feature Service
WMS Web Map Service
XBRL eXtensible Business Reporting Language
XML Extensible Markup Language
The Final Checklist
For a final check, use a dataset preparation checklist:
Dataset Preparation Checklist (from the EDP Open Data Goldbook for Data Managers and Data Holders)
Is mór ag an Aonad Sonraí Oscailte do chuid aiseolais, ceisteanna nó tuairimí. Bímid sásta i gcónaí cloisteáil uait.