Archive

Posts Tagged ‘technology’

Impressions over the Data-driven Visualization Symposium

Data-driven Visualization

Program of the event

Today I was attending an event entitled “Data-driven Visualization Symposium” in the beautiful Trippenhuis building of the KNAW in Amsterdam. There was a really rich schedule with 10 speakers showcasing some of their work in the area of big data and visualisation.

Though I would have appreciated getting a bit more of the how instead of just seeing finished products, I really enjoyed the presentations. Using two very different application domains (respectively the design of logos and the exploration of massive data), Wouter van Dijk and Edwin Valentijn showed cases where information overload can be dealt with by using clever reduction techniques. The data has to be turned into something more communicative and be part of our interaction schemes, among other social, mediated and IoT interactions. But at the end of his presentation about an atlas of Pentecostalism, Richard Vijgen also rightly reminded us to be very cautious when bringing data to an audience which has not received the critical thinking training researchers receive. This population may take everything we depict for granted without questioning the graphical representation. Mirjam Leunissen followed up on this idea when showing cases of Fox News and the Utrecht University doing visuals that trick the reader (e.g. 6M depicted graphically as 2/3 of 7M). Her talk was about how visuals can be either used to convey the numbers or convey an emotion/impression, the later being more used by data journalists. She pointed to several great examples of impression rich visualisations including  a view of the number of Syrian refugees, gun deaths in the US, the depth at which black boxes of a lost flight are expected to be found. To complete the showcasing of applications, Timo Hartmann and Anton Koning explained how visualisations can be used for collective decision making and mediated interactions in the civil engineering and medical domains.

In contrast to these port-folio presentations, Laurens van der Maaten and Elmar Eisemann gave two more technical talks respectively describing the t-SNE algorithm for visual clustering and explaining how common gaming techniques (frustum culling, LOD, ray tracing, …) can be combined with more advanced tricks related to the eye sensibility to provide high throughput, personalised, rendering. These techniques have been applied for rendering flooding models in the project “3DI”. A remote rendering cluster is used to ensure high-end graphics can be processed and shown on lower grade hardware.

Last but not least, Maarten van Meersbergen and Tijs de Kler gave an overview of what the eScience center has to offer in terms of hardware and expertise, welcoming the participants to make good use of both to crash test their data as early and often as possible.

As hinted a bit earlier, what I missed the most was more explanation and guidance about how to find the best way to convey a story with visual representations. Maybe also with a bit more information about the tooling to use. I also missed seeing visualisations using Wii-Us but that’s a different story 😉 Right now, I do not have a much more clearer idea on how we should visualise our census data from CEDAR than I had before attending the symposium… let’s see whether something pops up later when thinking more about everything that was shown today 🙂

One year of PiLOD project

Yesterday was the closing event of the Pilot Linked Open Data project. A significantly big crowd of politicians, civil servants, hackers, SME owners, open data activists and researchers gathered in the very nice building of the RCE in Amersfoort to hear about what has been done within this one year project lead by Erwin Folmer. But not only that, the participants also got some more insights into Linked Data usage outside of the project and a guided tour through the RCE. More information, photos, and links to the slides, can be found in the report about the event.

Oliver Bartlett and John Walker gave two keynotes explaining how Linked Data is put into use respectively at the BBC and at NXP. Both companies are using this technology to better describe their content and interconnect separated data sources. A shared objective besides having better and more efficient internal processes is to provide better services to the customers. Thanks to the harmonization and linkage of the data, these customers can expect to get more coherent data about what they care, be it a chip or a football player. The two presentations also highlighted two important facts about Linked Data: it’s versatile enough to be applied to two very different business domains such as media and chip manufacturing, 2) the data does not have to be open to be benefit form Semantic Web technologies – as of now, a lot of data at the BBC is becoming LD but none of this LD is LOD.

My activity within the project was around chatting (a lot, as I usually do :-p), writing two book chapters (“Publishing Open Data on the Web”, and “How-to: Linking resources from two datasets” ) and giving an hand on the “HuisKluis” work package managed by Paul Francissen.  I spoke a bit about the latest, showing a demo and some slides to explain how data is managed in the back-end. In short, the “HuisKluis” is a place where information about a house is found and shared. See the following video for a better introduction:

The prototype can be found at http://pilod-huiskluis.appspot.com/ . It works only for houses in the Netherlands but there are a few examples that can be used too:

huiskluis

Here are the few slides giving more details about the implementation:

If you want to really know everything about how things work, feel free to just look at the source code.

This PiLOD project was a pleasant and enriching experience, I’m very much looking forward to a PiLOD2 for a second year of LOD brainstorming and hacking together with Marcel, Arjen, Erwin, Paul, Lieke, Hans, Bart, Dimitri, … and the rest of the (the rather big) group 🙂 This post is also a good opportunity to thank again the Network Institute for having supported this collaboration with a generous research voucher. Thanks!

First release of SemanticXO!

Here it is: the first fully featured release of SemanticXO! Use it in your activities to store and share any kind of structured information with other XOs. The installation procedure is easy and only requires and XO-1 running the operating system version 12.1.0. Go to the GIT repository and download the files “setup.sh” and “semanticxo.tar.gz” somewhere the XO (these files are in the directory “patch_my_xo”). Then, log in as root and execute “sh setup.sh setup”. The installation package will copy the API onto the XO, setup the triple store and install two demo activity. Once the procedure is complete, reboot the XO to activate everything.

The XO after the installation of SemanticXO

There are two demo activities which are described in more details on the project page. Under the hood SemanticXO provides an API to store named graphs containing description of one or several resources. These named graphs are marked with an author name, a modification date and, eventually, a list of other devices (identified by their URI) to share the graph with. This data is used by a graph replication daemon which every 5 minutes browse the network using Avahi, find other triple stores, and download a copy of the graphs that are shared with it. The data backend of the mailing activity provides a good example of how the API is used.

Exposing API data as Linked Data

The Institute of Development Studies (IDS) is a UK based institute specialised in development research, teaching and communications. As part of their activities, they provide an API to query their knowledge services data set compromising more than 32k abstracts or summaries of development research documents related to 8k development organisations, almost 30 themes and 225 countries and territories.

A month ago, Victor de Boer and myself got a grant from IDS to investigate exposing their data as RDF and building some client applications making use of the enriched data. We aimed at using the API as it is and create 5-star Linked Data by linking the created resources to other resources on the Web. The outcome is the IDSWrapper which is now freely accessible, both as HTML and as RDF. Although this is still work in progress, this wrapper already shows some advantages provided by publishing the data as Linked Data.

Enriched data through linkage

When you query for a document, the API indicates you the language in which this document is wrote. For instance, “English”. The wrapper replaces this information by a reference to the matching resource in Lexvo. The property “language” is also replaced by the equivalent property as defined in Dublin Core, commonly used to denote the language a given document is wrote in. For the data consumer, Lexvo provides alternate spelling of the language name in different languages. Instead of just knowing that the language is named “English”, the data consumer, after deferencing the data from Lexvo will know that this language is also known as “Anglais” in French or “Engelsk” in Danish.

Part of the description of a document

Links can also be established with other resources to enrich the results provided. For instance, the information provided by IDS about the countries is enriched with a link to their equivalent in Geonames. That provides localised names for the countries as well as geographical coordinates.

Part of the description of the resource "Gambia"

Similarly, the description of themes is linked with their equivalent in DBpedia to benefit from the structured information extracted from their Wikipedia page. Thanks to that link, the data consumer gets access to some extra information such as pointers to related documents.

Part of the description of the theme "Food security"

Besides, the resources exposed are also internally linked. The API provides an identifier for the region a given document is related to. In the wrapper, this identifier is turned into the URI corresponding to the relevant resource.

Example of internal link in the description of a document

Integration on the data publisher side

All of these links are established by the wrapper, using either SPARQL requests (for DBpedia) or calls to data API (for Lexvo and Geonames). This is something any client application could do, obviously, but one advantage of publishing Linked Data is that part of the data integration work is done server side, by the person who has the most information about what his data is about. A data consumer just as to use the links already there instead of having to figure out a way to establish them himself.

A single data model

Another advantage for a data consumer is that all the data published by the wrapper, as well as all the connected data sets, are published in RDF. That is one single data model to consume. A simple HTTP GET asking for RDF content returns structured data for the content exposed by the wrapper, and the data DBpedia, Lexvo and Geonames. There is no need to worry about different data formats returned by different APIs.

Next steps

We are implementing more linking services and also working on making the code more generic. Our goal, which is only partially fullfiled now, is to have a generic tool that only requires an ontology for the data set to expose it as Linked Data. The code is freely available on GitHub, watch us to stay up to date with the evolution of the project 😉

%d bloggers like this: