Engagement Fund: Linked Data Publishing for data.gov.ie: Seminars, Methods and Tools - Christophe Debruyne and Declan O'Sullivan, ADAPT Centre, TCD.

July 17, 2017

Linked Data Publishing for data.gov.ie: Seminars, Methods and Tools

Christophe Debruyne

ADAPT Centre, Trinity College Dublin

christophe.debruyne@adaptcentre.ie

Declan O’Sullivan

ADAPT Centre, Trinity College Dublin

declan.osullivan@adaptcentre.ie

Most of the datasets currently hosted on the data.gov.ie portal have an openness score of 4 or higher, which means they are not available as RDF and, by consequence, also not linked with other internal or external Linked Data datasets.

We applied to the Open Data Engagement Fund for two projects: an Outreach and Engagement Project informing the public about Linked Data and the generation of RDF from non-RDF resources led by Prof. Declan O’Sullivan and a Innovative Use of Data on Linking datasets with Ordnance Survey Ireland’s authoritative Linked Data project driven by Dr. Christophe Debruyne.

While the former informed and instructed the public on how to generate RDF, the latter provided a lightweight method and a set of tools to interlink datasets and generate enriched CSV files. With enriched CSV files, we mean CSV files that contain additional columns with links (URIs) to other resources. Some of the tools developed in the second project were used in the hands-on session of the seminar. Given the synergies between the two projects, we report on both projects in this blog post.

Seminars on Linked Data and Generating RDF Datasets

The proposal for the seminars was entitled “Uplift: generating RDF from non-RDF resources” and its goals was to organize a two-day practical workshop. The first day was dedicated to introducing the concepts, standards, and best practices and guidelines in Linked Open Data. The second day was dedicated to tutorials on transforming non-RDF data sources into RDF datasets for either 1) publication on the Web as Linked Data, 2) inclusion in the data.gov.ie portal as RDF dataset, or 3) to provide data consumers a way to engage with the datasets published on the platform with semantic technologies. The objectives of these seminars were thus to

 

  1. Inform people about Linked Data;
  2. Instruct the public how to distil RDF and Linked Data of datasets available on data.gov.ie;
  3. Demonstrate how RDF can be used to support activities or build novel applications.

The seminars thus covered the RDF generation process which may be fit to publish data according to Linked Data principles, but the actual problems of interlinking were discussed during the seminars -- participants were pointed to the results of the second projects.

Thanks to the support provided by the Open Data Engagement Fund, the two seminars were organized free of charge. Tickets for both events were made available via Eventbrite and news were spread via various mailing lists. Both seminars attracted around 40 participants not only students and academics but also participants from industry, and public administration.

The first seminar took place on the 4th of May at the Ordnance Survey Ireland, for which they generously lent us the O’Donovan room and support in organizing the event. Colin Bray, Chief Executive and Chief Survey Officer of OSi, and Prof. Declan O’Sullivan, Head of Intelligent Systems and Principal Investigator in the School of Computer Science and Statistics of Trinity College Dublin welcomed the participants and introduced the speakers. The presentations that were given -- available at http://bit.ly/odef-linked-data -- were:

  • An introduction to Linked Data by Christophe Debruyne
  • Linked Data at the Ordnance Survey Ireland also by Christophe Debruyne
  • Linked Data at the Central Statistics Office by Eoin McCuirc
  • A presentation on enriching data with Linked Data by Dr. Kevin Koidl

While the first presentation provided the audience a general idea about Linked Data in terms of concepts, terminology, etc. the second two presentations informed the audience how these principles were implemented in two bodies of the Irish public administration. These two presentations furthermore highlighted the motivation of Linked Open data and its (potential) impact. The last presentation demonstrated, with a live demo, how Linked Data can be used to build applications once it has been published. Dr. Kevin Koidl presented was demonstrated how unstructured content can be “tagged” with Linked Data URIs using an open source framework called FREME (https://github.com/freme-project).

 

Christophe Debruyne presenting on Linked Data

Eoin McCuirc presenting the Linked Data project at the Central Statistics Office

The second seminar was held on the 10th of May in Trinity College Dublin. This seminar, which lasted for the whole day was organized as follows; the morning was dedicated to the RDB to RDF Mapping Language (R2RML), a W3C Recommendation to declare how relational databases (or tabular data in general) are mapped to RDF datasets. During that presentation, other (standardized) initiatives were briefly covered, though an argument for using declarative and standardized approaches were given. The second part of the day was dedicated to tutorials in which the participants were challenged to generate RDF from CSV files available on the data.gov.ie portal. Both presentations and material for the tutorials are available at http://bit.ly/uplift-r2rml. The tutorials used ADAPT’s implementation of the R2RML engine [1], though other (commercial) implementations can be adopted.

 

R2RML tutorial to generate RDF datasets from tabular data in TCD

During the tutorial, several participants -- while convinced of the role of Linked Data -- had trouble understanding who should be responsible for producing these datasets and, more importantly, how. While the general consensus is that one could be trained in these technologies, the need for adequate methods and tools emerged. One of the questions whether there were tools that facilitated the creation of these mappings, or guided one in creating those.

In [2], we reported on a lightweight methodology and a set of tools for enriching non-RDF data such as CSV with Linked Data via Semantic Technologies. One of these tools generates a R2RML mapping that simulates a direct mapping -- one that generates RDF directly reflecting the structure of the original source including the vocabulary -- available at [3]. This tool seemed to answer a particular need; allowing one to generate RDF that is not necessarily meaningful in terms of adopted vocabularies, but also allowing one to start from a mapping that can be incrementally changed. Subsequent minor edits to the mapping allowed the participants to observe and see how meaning is added to the RDF.

We also referred to ongoing research in representations that facilitated the creation, maintenance and interpretation of mappings, on of which is currently being investigated in ADAPT [4]. While an important part of the participants were able to generate RDF following the examples and the tutorial, we were pleased to see that some with no background in semantic technologies were able to finish the whole tutorial in the afternoon.

A Method and Tools for Enriching CSV with Authoritative Geospatial

The goal of this project is to add a geospatial dimension to datasets available on data.gov.ie with and using Ordnance Survey Ireland (OSi’s) authoritative Linked Data datasets available on data.geohive.ie, which is the result of an ongoing collaboration between the OSi and the ADAPT Centre.

We have proposed a lightweight methodology for i) transforming CSV files into RDF – a process called uplift, which allows us to ii) create links with other datasets and especially the Linked Data datasets provided by the Ordnance Survey Ireland, and iii) transform these results back into a CSV file with the geospatial dimension – a process we call downlift. This method furthermore will describe how one can engage with the geospatial dimension, which we will elaborate on later on in this section.

 

 

A lightweight methodology for adding a geospatial dimension to CSV files.

The method has been published in a paper [3], which was presented at the the Fourth International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, co-colated with SIGMOD and PODS, and well-received. The reviewers particularly welcomed the idea of downlift; transforming back to non-RDF, yet enriched formats.

The tools for uplift, downlift and generation of RDF have all been made available with accessible MIT licenses (see the references below).

To demonstrate how users can engage with the data, we proposed to use a client-side query interface for RDF that we have extended to support GeoSPARQL (an OGC standard for representing and querying geospatial data on the Linked Data Web). This work, developed in the context of our ongoing collaboration with the OSi, will was presented at LDOW at WWW 2017 [5].

This client-side query interface can both be used to build applications that can engage with the enriched data, or build a web-interface for combining, exploring and querying OSi’s data with the augmented and enriched CSV data. An example of which is shown in the figure below, which uses weather station data published on data.gov.ie.

 

Combining OSi’s Linked Data with that of an RDF representation of a CSV file.

In the figure above, we show how we combine OSi’s Linked Data with that of an RDF representation of a CSV file using Triple Pattern Fragments [6]. The querying of data from two different endpoints (in this OSi’s server (A) and the local machine (B)) is called a federated query. The query depicted in (C) creates triples between resources in both datasets based on a condition. In this case; a record is in a county if that record’s point is within that county’s geometry.

In summary, the outcomes of this project currently are 1) a paper describing how one can enrich CSV files with authoritative geospatial data as provided by the OSi, which describes 2) a method and 3) tools with accessible licenses to achieve this.

The beneficiaries of this project are data.gov.ie and data providers. Using this method, data.gov.ie will be able to host data with an explicit authoritative spatial component, and data providers will have access to best practices and guidelines to provide that spatial component in their datasets. We hope that our work would contribute to the provision of aligned open datasets on the portal.

In conclusion

In this blog post, we reported on the results of two initiatives funded by the Open Data Engagement Fund; seminars on Linked Data and Linked Data Publishing with R2RML, and a project that uses R2RML to generate RDF of datasets for creating links with Linked Data sets (in this case OSi’s geospatial data) as to enrich the original files with an authoritative geospatial dimension.

The two projects had many synergies; and results obtained in the second project were used to inform the participants of the seminars. While the seminars focused on the generation of RDF for linked data (four star), the project focused on the creation of links with other Linked Data datasets (five star). While the concepts of Linked Data and RDF dataset generation were already a lot to cover in only two days, participants did express their interest in a seminar dedicated to discovering and creating links. Something that we, as the ADAPT Centre, are considering looking into.

Acknowledgements

The ADAPT Centre for Digital Content Technology (https://www.adaptcentre.ie/) is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund. The gratefully acknowledge the support provided by the Irish Government Department of Public Expenditure and Reform’s Open Data Engagement Fund initiative to undertake these projects.

References

  1. R2RML Implementation: https://opengogs.adaptcentre.ie/debruync/r2rml
  2. Christophe Debruyne, Kris McGlinn, Lorraine McNerney, and Declan O'Sullivan. 2017. A lightweight approach to explore, enrich and use data with a geospatial dimension with semantic web technologies. In Proceedings of the Fourth International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data (GeoRich '17). ACM, New York, NY, USA, Article 1, 6 pages. DOI: https://doi.org/10.1145/3080546.3080548
  3. Generate Mapping Tool: https://opengogs.adaptcentre.ie/debruync/generate-mapping
  4. Ademar Crotti Junior, Christophe Debruyne, Declan O'Sullivan: Juma: an Editor that Uses a Block Metaphor to Facilitate the Creation and Editing of R2RML Mappings. ESWC (Satellite Events) 2017
  5. Christophe Debruyne, Eamonn Clinton, Declan O'Sullivan: Client-side Processing of GeoSPARQL Functions with Triple Pattern Fragments. LDOW@WWW 2017
  6. Ruben Verborgh, Miel Vander Sande, Olaf Hartig, Joachim Van Herwegen, Laurens De Vocht, Ben De Meester, Gerald Haesendonck, Pieter Colpaert: Triple Pattern Fragments: A low-cost knowledge graph interface for the Web. J. Web Sem. 37-38: 184-206 (2016)