Archive

Posts Tagged ‘research’

The World Wide Semantic Web (WWSW) and Web Alliance for Regreening in Africa (W4RA) on Dataversity

Some days ago Jennifer Zaino from Dataversity and myself, from DANS/eHG/VUA/WWSW had a nice discussion over what the WWSW and W4RA communities are doing to improve information sharing in challenging contexts. She then published a very good summary of this discussion on a blog post entitled “Bringing the Semantic Web and Linked Data to All” 🙂

It was a pleasure being given the opportunity to speak about our work and I hope this article will attract and motivate more people for joining our teams. The more we are thinking about those issues and acting upon them, the better and faster will be the outreach 😉

Advertisements

Why don’t we use gaming consoles for data visualisation ?

The official Wii U logo Español: Logo oficial ...

The official Wii U logo (Photo credit: Wikipedia)

Yesterday I was sitting in a very interesting meeting with some experts in data visualisation. There was a lot of impressive things presented and the name of Wii remote and Kinect were mentioned a couple of time. As I observed so far, these devices are used as cheap way to get sensors. And they certainly deliver, in the field of user interfaces as well as for robotics there have been achievements made thanks to these peripherals. But why does nobody seem to be using the complete gaming devices? Even the research field of serious gaming shows little overlap with the console gaming industry.

I’m a fan of Nintendo so my argumentation will be a bit Nintendo-centric, but the same point could easily be made about the devices from Nintendo competitors. Developing data visualisation on a Wii-U, or a handled device like the 3DS, has the potential to save time and reach a greater audience. The development kit sold to the gaming industry are reasonably priced and give access to a consumer-product grade gaming toolkit that are just ready to use. As far as the cheap hardware argument goes, the Wii-U is rather interesting: it’s a strong GPU with HDMI output associated to a tablet with all the sensors one may expect. There are also pointing capabilities inherited from the Wii and a dedicated social network for applications running on the Wii-U. Outside of gaming, this social network is already being used on the Wii-U for social TV and will certainly be used for new incarnation of services that used to be on the Wii (Weather, Polls, …). All of this works out of the box, no need to hack new things to get on making great interactive visualisation or serious games.

Then comes the argument of coding for a dedicated platform. It is true that the Wii-U runs a dedicated operating system which can be expect to be deployed on all Nintendo’s devices but not outside of Nintendo’s realm (pretty much like Apple’s iOS !). So far, Nintendo has applied a generation-1 compatibility to his devices meaning that things developed on one generation of console will work on the next one. The Wii-U runs Wii-U and Wii software. The Wii runs Wii and GameCube software, etc… Previous iterations of the backward compatibility required dedicated additional hardware but they seemed to have stopped doing that now. Thus, looking at a new generation of gaming consoles every 6 or 7 years, this gives a 12 to 14 years stability for anything developed on one platform. Another goody is that as a developer you will not need to update your visualisation to deal with the console update that will happen over this period. Actually, things are always developed for a dedicated platform. As far as picking one such platform goes, I would rather bet on the Web platform rather than Java, Android, iOS or Flash. This is the only one focusing on open standards that everyone can implement. Applications developed with modern Web technologies can run everywhere these technologies are supported (including the Wii-U, thanks to the popular WebKit !). The Google street view application for the Wii-U has been coded in HTML5, using no native code.

In term of outreach, developing our research prototype for the hardware from the gaming industry would bring our products to the living room. That is closer to a wide, diverse, share of the people whose money is actually used to fund the (public) research. If the output of a research project can make it to the market place of a console device, everybody will be able to just download it and use it from the couch. Eventually involving other family members and, now, remotely connected friends via the integrated social networking features.

Nintendo and his competitors are working hard at bringing new entertaining and social experiences. This go well beyond the mere gaming they used to focus only a couple of years ago. Entertainment giants expect us to throw out our DVD players, media players, smart TVs and music players to just use their console and a dumb (big) screen. I think it would be a waste not to consider their hardware when we plan our research activities. Let me know if you think otherwise 😉

Exposing API data as Linked Data

The Institute of Development Studies (IDS) is a UK based institute specialised in development research, teaching and communications. As part of their activities, they provide an API to query their knowledge services data set compromising more than 32k abstracts or summaries of development research documents related to 8k development organisations, almost 30 themes and 225 countries and territories.

A month ago, Victor de Boer and myself got a grant from IDS to investigate exposing their data as RDF and building some client applications making use of the enriched data. We aimed at using the API as it is and create 5-star Linked Data by linking the created resources to other resources on the Web. The outcome is the IDSWrapper which is now freely accessible, both as HTML and as RDF. Although this is still work in progress, this wrapper already shows some advantages provided by publishing the data as Linked Data.

Enriched data through linkage

When you query for a document, the API indicates you the language in which this document is wrote. For instance, “English”. The wrapper replaces this information by a reference to the matching resource in Lexvo. The property “language” is also replaced by the equivalent property as defined in Dublin Core, commonly used to denote the language a given document is wrote in. For the data consumer, Lexvo provides alternate spelling of the language name in different languages. Instead of just knowing that the language is named “English”, the data consumer, after deferencing the data from Lexvo will know that this language is also known as “Anglais” in French or “Engelsk” in Danish.

Part of the description of a document

Links can also be established with other resources to enrich the results provided. For instance, the information provided by IDS about the countries is enriched with a link to their equivalent in Geonames. That provides localised names for the countries as well as geographical coordinates.

Part of the description of the resource "Gambia"

Similarly, the description of themes is linked with their equivalent in DBpedia to benefit from the structured information extracted from their Wikipedia page. Thanks to that link, the data consumer gets access to some extra information such as pointers to related documents.

Part of the description of the theme "Food security"

Besides, the resources exposed are also internally linked. The API provides an identifier for the region a given document is related to. In the wrapper, this identifier is turned into the URI corresponding to the relevant resource.

Example of internal link in the description of a document

Integration on the data publisher side

All of these links are established by the wrapper, using either SPARQL requests (for DBpedia) or calls to data API (for Lexvo and Geonames). This is something any client application could do, obviously, but one advantage of publishing Linked Data is that part of the data integration work is done server side, by the person who has the most information about what his data is about. A data consumer just as to use the links already there instead of having to figure out a way to establish them himself.

A single data model

Another advantage for a data consumer is that all the data published by the wrapper, as well as all the connected data sets, are published in RDF. That is one single data model to consume. A simple HTTP GET asking for RDF content returns structured data for the content exposed by the wrapper, and the data DBpedia, Lexvo and Geonames. There is no need to worry about different data formats returned by different APIs.

Next steps

We are implementing more linking services and also working on making the code more generic. Our goal, which is only partially fullfiled now, is to have a generic tool that only requires an ontology for the data set to expose it as Linked Data. The code is freely available on GitHub, watch us to stay up to date with the evolution of the project 😉

Take home messages from ePSIplatform Conference

Open Data stickers

Open Data

On March 16, 2012 the European Public Sector Information Platform organised the ePSIplatform Conference 2012 on the theme “Taking re-use to the next level!”. A very well organised and interesting event, also a good opportunity to meet new persons and put a face on the names seen on the mails and during teleconferences 🙂

The program was intense: 3 plenary sessions, 12 break-out sessions and project presentations during the lunch break. That was a lot to talk about and a lot to listen to. I left Rotterdam with a number of take out messages and food for thought. What follows is a mix of my own opinions and things said by some of the many participants/speakers of the event.

We need to think more about data re-use

It’s a fact: Open Data has reached momentum and more and more data portals are being created. DataCatalogs currently lists 216 sources for Open Data. There could be something around a million of Open Data data sets now available, but how many applications? Maybe around 100k, at most. Furthermore, most on these applications do not really address “real problems” (e.g. help deciders to make educated choices by providing them with the right data at the right time, or optimise food distribution processes). Even if the definition of a “real problem” is open to discussion, there is surely something to think about.

This low number of applications could be explained by a lack of problems to tackle as well as it can be explained by a lack of motivated developers. The ePSI platform has just started a survey on story sharing. Reading about the (positive) experience of others is likely to trigger some vocations in the readers and get more developers on board. The upcoming W3C event about using Open Data will also be a good place to share such stories and spot the things to do next to foster an ecosystem of data and developers.

Open Data should be interactive

We have Open Data and we have Open Data consumers that happily take the data, process it and eventually re-publish it. Fine but we do poor when it comes to capture the added meta data from these users. If one of them spot an error in an open data set, or if missing data is identified, there is hardly any way to communicate this information to the data publisher. Most, if not all, data portals are “read only” and the eventual feedback they receive may not find a matching processing pipeline. Open source software solved this issue by using open bug trackers that allows for reporting bugs/feature requests and facilitate dispatching the issues to persons in charge of implementing them. Using such bug trackers to keep the data users in the loop sounds like a good plan. This is something we started to look at, in a slightly different way, for the projects CEDA_R and Data2Semantics. One of the use case of these projects is the Dutch historical census data (from 1795 onwards) that has to be harmonized and debugged (there was a lot of manual process involved to convert the paper reports in digital form). Only historians can take care of this, and they need to inform the data publisher about their finding – preferably using something even easier that the average bug tracker.

Open (messy) Data is a valuable business

Economical issues are common when speaking about Open Data. They could even be seen as the main obstacle to it. The other obstacles, technical, legal and societal/political being easier to address. So the trick is to convince data owners that, yes, they will loose the money they currently get in access fee but they will get more out of the Open Data, in an indirect way through businesses created. In fact, there is no market for the Open Data itself. Instead, this Open Data has to be seen as part of the global Data market of which DataPublica and OpenCorporates are two examples. In this market, curating and integrating data is a service clients can be charged for. Data companies transform the data into information and put a price tag on the process.  For this matter, having to publish an integrated data set as Open Data because it include pieces of an other Open Data set licensed with a GPL-like license will brake the process. Open Data is easier to consume when license under more BSD-like licenses.

If there is a market for messy open data,  one can wonder whether Linked Data is going against businesses or helping them. Linked Data allows for doing data integration at the publication level and Open Data published exposed using these principles is richer and easier to consume.  This means less work for the consumer, which may spare himself the cost of hiring someone to integrate the data. But Linked Data facilitates the job of data companies too. These could invest the time saved into the development of visualisation tools, for instance. So in the end, it may not be such a bad idea to continue promoting Linked Data 😉

Open Data initiatives need to become more consistent

Besides the definition given on OpenDefinition, and the 5-star scheme of Tim Berners Lee for Linked Data, there is not much out there to tell people what is Open Data and how to publish it. Data portals can be created from scratch or use CKAN and may expose the meta data about the data sets it contains in different ways (e.g. using DCAT or something else). The data itself can be published within a large spectrum of formats ranging from XLS sheets to PDFs to RDF. Besides this, data portals can be created at the scale of the city, a region, an entire country or an entity such as the EU. These different scales are related to each other and can be seen as a result from a lack of coordination. Directories are important as a way to know what data is out there, and also what data is missing. If everyone take initiatives at different scales, the outcome of this indexing process will be fuzzy and the outcome quite confusing for data users looking for open data sets. On the other hand, self-organisation is often the best solution to build and maintain complex systems (c.f. “Making Things Work” from Y. Bar-Yam). So maybe things are good as they are but we should still avoid ending up with too many data portals  partially overlapping and incompatible with each other.

As far as the data is concerned, PDF, XLS, CSV, TSV, … are all different ways to create data silos that just provide a single view over the data – even a non machine readable one in the case of many PDFs. RDF is here to improve consistency across data sets with a unique, graph based, data model. This data model facilitates sharing data across data sets. It is not the only solution to do that, the data set publishing language (DSPL) from Google being an other one, but it is the only one based on W3C standards. This guarantees the openness of the data format and a constant support, just as for the standards that make the Web (HTML, HTTP, CSS, …).

Don’t underestimate the “wow” effect

During one of the break-out sessions, I was intrigued hearing one of the panel speaker saying he would like to see more DSPL around than RDF. After some (pacific) discussion, we agreed on the following points: RDF is more expressive than DSPL, DSPL comes with an easy to use suite of plug&play tools to play with the data. It seems that if you want to re-use Open Data to do some plots, eventually for some data journalism use-cases, you are better off using DSPL. It is simpler and through the data explorer allows anyone to build graphs in a few clicks. Users prefer having button and sliders to play with simpler data rather than knowing that they have in their hands the most powerful knowledge representation scheme and that they could do anything with it – but finally do nothing with it because of the induced high learning curve. I’m all in favour of Open Data and I try to motive people, and myself sometime, to use Linked Data to publish data sets. Still, I think we have a major issue there: our data model is better but we do not compete yet on the usability side of the story.

An other manifestation of the “wow” effect: the most impressive visualisation show at the event was a part of the video documentaries “The Netherlands from above”, and their matching interactive data explorers.  This is a very nicely done job but the interesting bit is that not only the data was not linked, it was also not open! However, even at an event about re-use of Open Data, nobody seemed to care much. The data was acquired for free from different providers, with some difficulties for some, had to be curated and transcoded, and could not be shared. But the movies are very nice, and the sliders on the interactive pages fun to play with…

We must not rest on our laurels

Finally, and that was also the final message of the event, we should not rest on our laurels. Open data is well received. Many are going into the “Open unless” way of thinking but some others make an Open Data portal just because it is trendy, and trash it after some months. We need to continue explaining to data owners why they should open their data and explain why Linked Data is a good technical solution to implement. Then, we need to find more active users for the data because, in the end, if the data is used, nobody will even dare shutting down the portal serving it. Having these active users  may be our only guarantee that data published as Open Data will remain as such for the years to come.

Complexity and cooperation

Last week, I attended a seminar about “Understanding and Managing Complex Systems” organised by the Royal Netherlands Academy of Arts and Sciences (KNAW) together with the Netherlands Organisation for Scientific Research (NWO). The take home message from this seminar is that 1) Complex Systems are highly popular in Amsterdam, all the 200 available seats where taken the day the registration was open and 2) Complex Systems is the science of cooperation.

In a first session, Martin Nowak explained in a very good talk that 5 different cooperation mechanisms can be observed in an evolving population. 1) Direct reciprocity: individuals cooperate if the individuals they interact with are cooperative (“Tit for Tat“), 2) Indirect reciprocity: based on reputation, this motivate cooperation by the social gain to be expected. Cooperators gain reputation points and became known as cooperators in the networks. This important mechanism could not survive without extended communication capabilities making it possible to diffuse this reputation. 3) Spatial distribution: cooperation (and defection) is better achieved in cluster of individuals, both behaviors can co-exist within different clusters of individuals. This connects with 4) Group selection: there is a multi-level aspects of group selection, infections at one scale can have more impact on another. Lastly, 5) Kin selection, Nepotism: individuals tend to cooperate more with others close to their kin and defect those that are less similar. Then, Kees Stam explained how the brain exhibits small world and scale-free properties. It is also modular with several zones dedicated to particular tasks and cooperating. Mental disorders, and also the effect of aging, maps to changes in the connectivity between the different hubs in the network. Although these results where observed on simplified networks, a full model of the network of a brain – with all its neurons – it’s on its way to be created. That will be a huge network to study!

During the second session, Dan Braha gave a fantastic talk on the importance of the in and out degree of the nodes in a network. The notion of hubs and the global degree are not enough to explain network responses and the study of the covariance between in and out degrees can provide better insights on the dissemination of messages along the connections. This talk was followed with that of Michael Batty who described the evolution of cities and some models to predict their growth and increase in complexity.

The day concluded on a pleasant musical session with Marten Scheffer playing and inviting us to reflect around the topic of complexity and interaction within our civilizations.

Update: the presentations are now visible online

Updates about SemanticXO

With the last post about SemanticXO dating back from April, it’s time for an update, isn’t it? 😉

A lot of things happened since April. First, a paper about the project was accepted for presentation at the First International Conference on e-Technologies and Networks for Development (ICeND2011). Then, I spoke about the project during the symposium of the Network Institute as well as during the SugarCamp #2. Lastly, a first release of a triple-store powered Journal is now available for testing.

Publication

The paper entitled “SemanticXO : connecting the XO with the World’s largest information network ” is available from Mendeley. It explains what the goal of the project is and then report on some performance assessement and a first test activity. Most of the information contained has actually been blogged before on this blog (c.f. there and there) but if you want a global overview of the project, this paper is still worth a read. The conference in itself was very nice and I did some networking. I came back with a lot of business card and the hope of keeping in touch with the people I met there. The slides from the presentation are available from SlideShare

Presentations

The Network Institute of Amsterdam organised on May 10 a one-day symposium to strengthen the ties between its members and to stimulate further collaboration. This institute is a long-term collaboration between groups from the Department of Computer Science, the Department of Mathematics, the Faculty of Social Sciences and the Faculty of Economics and Business Administration. I presented a poster about SemanticXO and an abstract went into the proceedings of the event.

More recently, I spent the 10 and the 11 of September at Paris for the Sugar Camp #2 organised by OLPC France. Bastien managed me a bit of time on Sunday afternoon to re-do the presentation from ICeND2011 (thanks again for that!) and get some feedback. This was a very well organised event held at a cool location (“La cité des sciences“), it was also the first time I met so many other people working on Sugar and I could finally put some faces on the name I saw so many time on the mailing lists and on the GIT logs 🙂

First SemanticXO prototype

The project developement effort is split in 3 parts: a common layer hidding the complexity of SPARQL, a new implementation of the journal datastore and the coding of diverse activities making use of the new semantic capabilities. All three are going more or less in parallel, at different speed, as, for instance, the work on activities direct what the common layer will contain. I’ve focused my efforts on the journal datastore to get something ready to test. It’s a very first prototype that has been coded starting with the genuine datastore 0.92 and replacing the part in charge of the metadata. The code taking care of the files remains the same. This new datastore is available from Gitorious but because installing the triple store and replacing the journal is a tricky manual process, I bundled all of that 😉

Installation

The installation bundle consists of two files, a “semanticxo.tgz” and a script “patch-my-xo.sh“. To install SemanticXO, you need to download the two and put them in the same location somewhere on your machine and then type (as root):

sh ./patch-my-xo.sh setup

This will install a triple store, add it to the daemons to start at boot time and replace the default journal by one using the triple store. Be careful to have backups if needed as this will remove all the content previously stored in the journal! Once the script has been executed, reboot the machine to start using the new software.

The bundle has been tested on an XO-1 running the software release 11.2.0 but it should work on any software release on both the XO-1 and XO-1.5. This bundle won’t work on the 1.75 has it contains a binary (the triple store) not compiled for ARM.

What now?

Now that you have the thing installed, open the browser and go to “http://127.0.0.1:8080”. You will see the web interface of the triple store which allows you to make some SPARQL queries and see which named graphs are stored. If you are not fluent in SPARQL, the named graph interface is the most interesting part to play with. Every entry in the journal gets its own named graph, after having populated the journal with some entries you will see this list of named graphs growing. Click on one of them and the content of the journal entry will be displayed. Note that this web interface is also accessible from any other machine on the same network as the XO. This yields new opportunities in terms of backup and information gathering: a teacher can query the journal of any XO directly from a school server, or an other XO.

Removing

The patch script comes with an uninstall function if you want to revert the XO to its original setup. To use it, simply type (as root):

sh ./patch-my-xo.sh remove

and then reboot the machine.

%d bloggers like this: