Archive

Posts Tagged ‘SemanticXO’

First release of SemanticXO!

Here it is: the first fully featured release of SemanticXO! Use it in your activities to store and share any kind of structured information with other XOs. The installation procedure is easy and only requires and XO-1 running the operating system version 12.1.0. Go to the GIT repository and download the files “setup.sh” and “semanticxo.tar.gz” somewhere the XO (these files are in the directory “patch_my_xo”). Then, log in as root and execute “sh setup.sh setup”. The installation package will copy the API onto the XO, setup the triple store and install two demo activity. Once the procedure is complete, reboot the XO to activate everything.

The XO after the installation of SemanticXO

There are two demo activities which are described in more details on the project page. Under the hood SemanticXO provides an API to store named graphs containing description of one or several resources. These named graphs are marked with an author name, a modification date and, eventually, a list of other devices (identified by their URI) to share the graph with. This data is used by a graph replication daemon which every 5 minutes browse the network using Avahi, find other triple stores, and download a copy of the graphs that are shared with it. The data backend of the mailing activity provides a good example of how the API is used.

1 minute video about SemanticXO

The VU is making short videos of 1 minute to highlight some of the research that is being done within its walls. This is the video for SemanticXO, realised by Pepijn Borgwat and presented by Laurens Rietveld.

The script is in Dutch and is as follows:

  • Ik ben laurens rietveld en ik doe onderzoek aan de vrije universiteit naar semantische netwerken.
  • Ik wil iets vertellen over onderzoek van Christophe Gueret dat zich richt op laptops die in ontwikkelingslanden gebruikt worden.
  • Dit is de XO laptop, het is een goedkope stevige laptop die onderwijs bij kinderen moet bevorderen.
  • Op de laptop draait sugar, dat is een constructieve leeromgeving speciaal ontworpen voor jonge leerlingen.
  • Op dit moment blijven alle gegevens die gegenereerd worden in de leeromgeving, in de xo laptop. Als een gesloten kleine data doos.
  • Met dit onderzoek willen we data uitwisseling verbeteren door gebruik te maken van principes van het semantic web.
  • Op die manier kan de data, zoals berichten of tekeningen, gemakkelijk binnen kleine lokale netwerken worden verspreid.
  • Zodra 1 laptop met het netwerk verbonden is kan die lokala data delen met de buitenwereld.
  • Andersom kunnen gegevens van de rest van het internet, ook binnen het lokale netwerk worden gedeeld.

In case you don’t speak Dutch, you may find the following translation to be useful 😉

  • My name is Laurens Rietveld and I do research on Semantic Networks at the Free University of Amsterdam.
  • I will tell you about the research of Christophe Guéret which concerns laptops being used in developing countries.
  • This is the laptop “XO”, it is a cheap and robust laptop used to support the education of kids.
  • The laptop runs “Sugar”, a constructionist learning environment especially designed for young learners.
  • Currently, all the data that is generated within the learning environment stays in the XO. Like if it was within a closed data silo.
  • With this research we aim at improving data sharing capabilities by using Semantic Web technologies.
  • In doing so, the data (for instance, messages or drawings) can be easily shared within a local network.
  • As soon as one of these laptop gets access to Internet, it becomes possible to share this data with the outside world too.
  • Vice versa, data from Internet can be downloaded and shared across the local network.

Does it scale?

Scaling is often a central question for data intensive projects, making use of Semantic Web technologies or not, and SemanticXO is no exception to that. The triple store is used as a back end for the Journal of Sugar, which is a central component recording the usage of the different activities. This short post discusses the results found for two questions: “how many journal entries can the triple store sustain?” and “how much disk space is used to store the journal entries?”

Answering these questions means loading some Journal entries and measuring the read and write performances along with the disk space used. This is done by a script which randomly generate Journal entries and insert them in the store. A text sampler and the real names of activities are used to make these entries realistic in terms of size. An example of such generated entry, serialised in HTML, can be seen there. The following graphs show the results obtained for inserting 2000 journal entries. These figures have been averaged over 10 runs, each of them starting with a freshly created store. The triple store used is called “RedStore“, it is called with an hash based BerkleyDB backend. The test machine is an XO-1 running the software 11.2.0.

The disk space is minimal for up to 30 entries, grows rapidly between 30 and 70 entries and continues on a linear basis from that number on. The maximum space occupied is a bit less than 100MB which is few of the 1GB of storage of the XO-1.

 

Amount of disk space used by the triple store

The results for the read and write delay are a bit less of a good news. Write operations are constant in time and always take around 0.1 s. Getting an entry from the triple store proves to get linearly slower as the triple store gets filled. It can be noticed that for up to 600 entries, the retrieval time of an entry is below a second. This should provide a reasonable response time. However, with 2000 entries stored the retrieval time goes as high as 7 seconds 😦

Read and write access time

The answer to the question we started with (“Does it scale?”) is then “yes, for up to 600 entries” considering a first generation device and the current status of the software components (SemanticXO/Redstore/…). This answers also yields new questions, among which: Are 600 entries enough for a typical usage of the XO? Is it possible to improve the software to get better results? How are the result on some more recent hardware?

I would appreciate a bit of help for answering all of these, and especially the last one. I only have an XO-1 and can not thus run my script on an XO-1.5 or XO-1.75. If you have such device and are willing to help me getting the results, please download the package containing the performance script and the triple store and follow the instructions for running it. After a day of execution or so, this script will generate three CSV files that I could then postprocess to get similar curves as the one showed.

Updates about SemanticXO

With the last post about SemanticXO dating back from April, it’s time for an update, isn’t it? 😉

A lot of things happened since April. First, a paper about the project was accepted for presentation at the First International Conference on e-Technologies and Networks for Development (ICeND2011). Then, I spoke about the project during the symposium of the Network Institute as well as during the SugarCamp #2. Lastly, a first release of a triple-store powered Journal is now available for testing.

Publication

The paper entitled “SemanticXO : connecting the XO with the World’s largest information network ” is available from Mendeley. It explains what the goal of the project is and then report on some performance assessement and a first test activity. Most of the information contained has actually been blogged before on this blog (c.f. there and there) but if you want a global overview of the project, this paper is still worth a read. The conference in itself was very nice and I did some networking. I came back with a lot of business card and the hope of keeping in touch with the people I met there. The slides from the presentation are available from SlideShare

Presentations

The Network Institute of Amsterdam organised on May 10 a one-day symposium to strengthen the ties between its members and to stimulate further collaboration. This institute is a long-term collaboration between groups from the Department of Computer Science, the Department of Mathematics, the Faculty of Social Sciences and the Faculty of Economics and Business Administration. I presented a poster about SemanticXO and an abstract went into the proceedings of the event.

More recently, I spent the 10 and the 11 of September at Paris for the Sugar Camp #2 organised by OLPC France. Bastien managed me a bit of time on Sunday afternoon to re-do the presentation from ICeND2011 (thanks again for that!) and get some feedback. This was a very well organised event held at a cool location (“La cité des sciences“), it was also the first time I met so many other people working on Sugar and I could finally put some faces on the name I saw so many time on the mailing lists and on the GIT logs 🙂

First SemanticXO prototype

The project developement effort is split in 3 parts: a common layer hidding the complexity of SPARQL, a new implementation of the journal datastore and the coding of diverse activities making use of the new semantic capabilities. All three are going more or less in parallel, at different speed, as, for instance, the work on activities direct what the common layer will contain. I’ve focused my efforts on the journal datastore to get something ready to test. It’s a very first prototype that has been coded starting with the genuine datastore 0.92 and replacing the part in charge of the metadata. The code taking care of the files remains the same. This new datastore is available from Gitorious but because installing the triple store and replacing the journal is a tricky manual process, I bundled all of that 😉

Installation

The installation bundle consists of two files, a “semanticxo.tgz” and a script “patch-my-xo.sh“. To install SemanticXO, you need to download the two and put them in the same location somewhere on your machine and then type (as root):

sh ./patch-my-xo.sh setup

This will install a triple store, add it to the daemons to start at boot time and replace the default journal by one using the triple store. Be careful to have backups if needed as this will remove all the content previously stored in the journal! Once the script has been executed, reboot the machine to start using the new software.

The bundle has been tested on an XO-1 running the software release 11.2.0 but it should work on any software release on both the XO-1 and XO-1.5. This bundle won’t work on the 1.75 has it contains a binary (the triple store) not compiled for ARM.

What now?

Now that you have the thing installed, open the browser and go to “http://127.0.0.1:8080”. You will see the web interface of the triple store which allows you to make some SPARQL queries and see which named graphs are stored. If you are not fluent in SPARQL, the named graph interface is the most interesting part to play with. Every entry in the journal gets its own named graph, after having populated the journal with some entries you will see this list of named graphs growing. Click on one of them and the content of the journal entry will be displayed. Note that this web interface is also accessible from any other machine on the same network as the XO. This yields new opportunities in terms of backup and information gathering: a teacher can query the journal of any XO directly from a school server, or an other XO.

Removing

The patch script comes with an uninstall function if you want to revert the XO to its original setup. To use it, simply type (as root):

sh ./patch-my-xo.sh remove

and then reboot the machine.

Clustering activity for the XO

In the past few years many data sets have been published and made public in what is now often called the Web of Linked Data, making a step towards the “Web 3.0”: a Web combining a network of documents and data suitable for both human and machine processing. In this Web 3.0, programs are expected to give more precise answers to queries as they will be able to associate a meaning (the semantic) to the information they process. Sugar, the graphical environment found on the XO, is currently Web 2.0 enabled – it can browse web sites – but has no dedicated tools to interact with the Web 3.0. The goal of the SemanticXO project introduced earlier in this blog is to make Sugar Web 3.0 proof by adding semantic software on the XO.

One corner stone of this project is to get a triple store, the software in charge of storing the semantic data, running on the limited hardware of the machine (in our case, an XO-1). As it proved to be feasible, we can now go further and start building activities making use of it. And to begin with, a simple clustering activity: the goal there is to cluster into boxes using drag&drop. The user can create as many boxes as he needs, and the items may be moved around boxes. Here is a screenshot of the application, showing Amerindian items:

Prototype of the clustering activity

The most interesting aspect of this activity is actually under its hood and is not visible on the screenshot. Here is a some of the triples generated by the application (note that the URLs have been shortened for readability) :

subject predicate object
olpc:resource/a05864b4 rdf:type olpc:Item
olpc:resource/a05864b4 olpc:name “image114”
olpc:resource/a05864b4 olpc:hasDepiction “image114.jpg”
olpc:resource/a82045c2 rdf:type olpc:Box
olpc:resource/a82045c2 olpc:hasItem olpc:resource/a05864b4
olpc:resource/78cbb1f0 rdf:type olpc:Box

It is relevant to note here the flexibility of that data model: The assignment of one item to the only box is stated by a triple using the predicate “hasItem”, one of the box is empty because there is no such statement linking it to an item. A varied number of similar triples can be used, without any constraint and the same goes for actually all the triples in the system. There is no requirement for a set of predicates all the items must have. Let’s see the usage that can be made of this data through three different SPARQL queries, introduced from the simple one to the most sophisticated:

  • List the URIs of all the boxes and the items they contain
  • SELECT ?box ?item WHERE {
    ?box rdf:type olpc:Box.
    ?box olpc:hasItem ?item.
    }
    
  • List of the items and their attributes
  • SELECT ?item ?property ?val WHERE {
      ?item rdf:type olpc:Item.
      ?item ?property ?val.
    }
    
  • List of the items that are not in a box
  • SELECT ?item WHERE {
      ?item rdf:type olpc:Item.
      OPTIONAL {
        ?box rdf:type olpc:Box.
        ?box olpc:hasItem ?item.
      }
      FILTER (!bound(?box))
    }
    

These three queries are just some examples, the really nice thing about this query mechanism is that (almost) anything can be asked through SPARQL. There is no need to define a set of API calls to cover a list of anticipated needs, as soon as the SPARQL end point is made available every activity may ask whatever it wants to ask! 🙂

We are not done yet as there is still a lot to develop to finish the application (game mechanism, sharing of items, …). If you are interested in knowing more about the clustering prototype, feel free to drop a comment on this post and/or follow this activity on GitHub. You can also find more information in this technical report about the current achievements of SemanticXO and the ongoing work.