Archive

Posts Tagged ‘Linked Data’

Pourquoi utiliser le Web de données?

January 29, 2013 1 comment

Il y a quelque jours j’ai eu le plaisir, et la chance, de participer à la série de webinaires organisés par l’AIMS. L’objectif que je m’étais fixé pour ma présentation (en Français) intitulée “Clarifier le sens de vos données publiques avec le Web de données” était de démontrer l’avantage de l’utilisation du Web de données du point de vue du fournisseur de données, en passant par le consommateur. Faire une présentation sans aucun retour de la part de l’auditoire était une expérience intéressante que je renouvèlerait volontiers si une nouvelle occasion se présente. Surtout si c’est Imma et Christophe qui sont aux commandes! grâce à eux tout était parfaitement organisé et le wébinaire s’est déroulé sans problème :-)

Si vous voulez voir si cette présentation atteint son but, les diapositives sont disponible sur Slideshare:

Une autre copie de cette présentation est disponible sur le compte SlideShare de l’AIMS.

Behind the scenes of a Linked Data tutorial

November 27, 2012 2 comments

Last week, on the afternoon of November 22, I co-organized a tutorial about Linked Data aimed at researchers from digital humanities. The objective was to give a basic introduction to the core principles and to do that in a very hands-on setting, so that everyone can get a concrete experience with publishing Linked Data.

Everyone listening to Clement speaking about Linked Data and RDFa

To prepare this event, I teamed up with Clement Levallois (@seinecle) from the Erasmus University in Rotterdama. He is an historian of science with interests in network analysis, text processing and other compartments of the digital humanities. He had only heard of Linked Data and was eager to learn more about it. We started of by preparing together a presentation skeleton and the setup for the hands-on. During this he was shouting every time I was using a word he deemed too complex (“dereferencing”, “ontology”, “URI”, “reasoning”, …). In the end, “vocabulary” and “resource” are most probably the two most technical concepts that made it through. Then I took care of writing the slides, and he simplified them again before the tutorial. It is also him that presented them, I was just standing on the side all time.

The result: a researcher from digital humanities explaining to a full room of fellow researchers what Linked Data is and how it can be useful to them. Everyone was very interested & managed to annotate some HTML pages with RDFa, thereby creating a social network of foaf:knows relations among the individuals they described :-) We concluded the tutorial by plotting that network using a tool that Clement developed.

This was a very efficient and interesting collaboration! For those interested in what we did, all the material is available on dropbox and the presentation is on slideshare:

5-stars Linked Open Data pays more than Open Data

Let’s assume you are the owner of a CSV file with some valuable data. You derive some revenue from it by selling it to consumers that do traditional data integration. They take your file and import it into their own data storage solution (for instance, a relational database) and deploy applications on top of this data store.

Traditional data integration

Data integration is not easy and you’ve been told that Linked Open Data facilitates it so you want to publish your data as 5-star Linked Data. The problem is that the first star speaks about “Open license” (follow this link for an extensive description of the 5-star scheme) and that sounds orthogonal to the idea of making money with selling the data :-/

If you publish your CSV as-is, under an open license, you get 3-stars but don’t make money out of serving it. Trying to get 4 or 5 stars means more effort from you as a data publisher and will cost you some money, still without earning you back any…

Well, let’s look at this 4th star again. Going from 3 stars to 4 means publishing descriptions of the entities in the Web. All your data items get a Web page on their own with the structured data associated to them. For instance, if your dataset contains a list of cities with their associated population every of this city as its own URI with the population indicated in it. From that point, you get the 5th star by linking these pages to other pages published as Linked Open Data.

Roughly speaking, your CSV file is turned into a Web site and this is how you can make money out of it. Like for any website, visitors can look at individual pages and do whatever they want with them. They can not however dump the entire web site into their machine. Those interested in getting all the data can still buy it from you, either as a CSV or RDF dump.

Users of your data have the choice between two data usage process: use parts of the data through the Linked Open Data access or buy it all, and integrate it. They are free to choose the best solution for them depending on their needs and resources.

Using Linked Open Data

Some added side bonuses of going 5-star instead of sticking at 3:

  • Because part of the data is open for free, you can expect to get more users screening it and reporting back errors;
  • Other data publishers can easilly link their data set with yours by re-using the URIs of the data items. This increases the value of the data;
  • In its RDF format, it is possible to  add some links within the data set. Thereby doing part of the data integration work on the behalf of the data consumers – who will be grateful for it!
  • Users can deploy a variety of RDF-enabled tools to consume your data in various ways;

Sounds good, doesn’t it? So, why not publishing all your 3-star data as 5-star right away? ;-)

Downscaling Entity Registries for Poorly-Connected Environments

VeriSign logo

VeriSign logo (Photo credit: Wikipedia)

Emerging online applications based on the Web of Objects or Linked Open Data typically assume that connectivity to data repositories and entity resolution services are always available. This may not be a valid assumption in many cases. Indeed, there are about 4.5 billion people in the world who have no or limited Internet access. Many data-driven applications may have a critical impact on the life of those people, but are inaccessible to those populations due to the architecture of today’s data registries.

Examples of data registries include the domain name registries. These are databases containing registered Internet domain names. They are necessary for all Web users wishing to visit a website knowing its URL (e.g. http://semweb4u.wordpress.com) rather than its IP address (e.g. http://76.74.254.120). Another example of data registry is the Digital Object Architecture (DOA) which assigns unique identifiers to digital objects (e.g. scientific publications).

Registries are critical components of today’s Internet architecture. They are widely used in every-day Web activities but their usage is severely impaired in poorly connected or ad-hoc environments. In this context, centralized data management – as typically used by current data registries – is of limited practicability, if only possible in the first place. There is a need for hybrid models mixing decentralized and hierarchical infrastructures to support data-driven application in environments with limited Internet connectivity.

Philippe Cudré-Mauroux and myself, received a $200,000 research grant from VeriSign Inc. (PDF version) to investigate such novel approaches for data registries. During this 12 months project, we will develop decentralized solutions to the problems of entity publication, search, de-duplication, storage and caching. A running prototype will be tested on the XO laptop, a laptop used by young learners in developing countries – most often in a mesh context with limited Internet connectivity.

Please don’t hesitate to contact us to ask for information about this project, we’d be happy to talk more about our plans :-)

Exposing API data as Linked Data

The Institute of Development Studies (IDS) is a UK based institute specialised in development research, teaching and communications. As part of their activities, they provide an API to query their knowledge services data set compromising more than 32k abstracts or summaries of development research documents related to 8k development organisations, almost 30 themes and 225 countries and territories.

A month ago, Victor de Boer and myself got a grant from IDS to investigate exposing their data as RDF and building some client applications making use of the enriched data. We aimed at using the API as it is and create 5-star Linked Data by linking the created resources to other resources on the Web. The outcome is the IDSWrapper which is now freely accessible, both as HTML and as RDF. Although this is still work in progress, this wrapper already shows some advantages provided by publishing the data as Linked Data.

Enriched data through linkage

When you query for a document, the API indicates you the language in which this document is wrote. For instance, “English”. The wrapper replaces this information by a reference to the matching resource in Lexvo. The property “language” is also replaced by the equivalent property as defined in Dublin Core, commonly used to denote the language a given document is wrote in. For the data consumer, Lexvo provides alternate spelling of the language name in different languages. Instead of just knowing that the language is named “English”, the data consumer, after deferencing the data from Lexvo will know that this language is also known as “Anglais” in French or “Engelsk” in Danish.

Part of the description of a document

Links can also be established with other resources to enrich the results provided. For instance, the information provided by IDS about the countries is enriched with a link to their equivalent in Geonames. That provides localised names for the countries as well as geographical coordinates.

Part of the description of the resource "Gambia"

Similarly, the description of themes is linked with their equivalent in DBpedia to benefit from the structured information extracted from their Wikipedia page. Thanks to that link, the data consumer gets access to some extra information such as pointers to related documents.

Part of the description of the theme "Food security"

Besides, the resources exposed are also internally linked. The API provides an identifier for the region a given document is related to. In the wrapper, this identifier is turned into the URI corresponding to the relevant resource.

Example of internal link in the description of a document

Integration on the data publisher side

All of these links are established by the wrapper, using either SPARQL requests (for DBpedia) or calls to data API (for Lexvo and Geonames). This is something any client application could do, obviously, but one advantage of publishing Linked Data is that part of the data integration work is done server side, by the person who has the most information about what his data is about. A data consumer just as to use the links already there instead of having to figure out a way to establish them himself.

A single data model

Another advantage for a data consumer is that all the data published by the wrapper, as well as all the connected data sets, are published in RDF. That is one single data model to consume. A simple HTTP GET asking for RDF content returns structured data for the content exposed by the wrapper, and the data DBpedia, Lexvo and Geonames. There is no need to worry about different data formats returned by different APIs.

Next steps

We are implementing more linking services and also working on making the code more generic. Our goal, which is only partially fullfiled now, is to have a generic tool that only requires an ontology for the data set to expose it as Linked Data. The code is freely available on GitHub, watch us to stay up to date with the evolution of the project ;-)

Follow

Get every new post delivered to your Inbox.

Join 355 other followers

%d bloggers like this: