Archive

Archive for the ‘SemanticXO’ Category

ICT 4 Development course final presentations

[object Window]

via ICT 4 Development course final presentations.

First release of SemanticXO!

Here it is: the first fully featured release of SemanticXO! Use it in your activities to store and share any kind of structured information with other XOs. The installation procedure is easy and only requires and XO-1 running the operating system version 12.1.0. Go to the GIT repository and download the files “setup.sh” and “semanticxo.tar.gz” somewhere the XO (these files are in the directory “patch_my_xo”). Then, log in as root and execute “sh setup.sh setup”. The installation package will copy the API onto the XO, setup the triple store and install two demo activity. Once the procedure is complete, reboot the XO to activate everything.

The XO after the installation of SemanticXO

There are two demo activities which are described in more details on the project page. Under the hood SemanticXO provides an API to store named graphs containing description of one or several resources. These named graphs are marked with an author name, a modification date and, eventually, a list of other devices (identified by their URI) to share the graph with. This data is used by a graph replication daemon which every 5 minutes browse the network using Avahi, find other triple stores, and download a copy of the graphs that are shared with it. The data backend of the mailing activity provides a good example of how the API is used.

Downscaling Entity Registries for Poorly-Connected Environments

VeriSign logo

VeriSign logo (Photo credit: Wikipedia)

Emerging online applications based on the Web of Objects or Linked Open Data typically assume that connectivity to data repositories and entity resolution services are always available. This may not be a valid assumption in many cases. Indeed, there are about 4.5 billion people in the world who have no or limited Internet access. Many data-driven applications may have a critical impact on the life of those people, but are inaccessible to those populations due to the architecture of today’s data registries.

Examples of data registries include the domain name registries. These are databases containing registered Internet domain names. They are necessary for all Web users wishing to visit a website knowing its URL (e.g. http://semweb4u.wordpress.com) rather than its IP address (e.g. http://76.74.254.120). Another example of data registry is the Digital Object Architecture (DOA) which assigns unique identifiers to digital objects (e.g. scientific publications).

Registries are critical components of today’s Internet architecture. They are widely used in every-day Web activities but their usage is severely impaired in poorly connected or ad-hoc environments. In this context, centralized data management – as typically used by current data registries – is of limited practicability, if only possible in the first place. There is a need for hybrid models mixing decentralized and hierarchical infrastructures to support data-driven application in environments with limited Internet connectivity.

Philippe Cudré-Mauroux and myself, received a $200,000 research grant from VeriSign Inc. (PDF version) to investigate such novel approaches for data registries. During this 12 months project, we will develop decentralized solutions to the problems of entity publication, search, de-duplication, storage and caching. A running prototype will be tested on the XO laptop, a laptop used by young learners in developing countries – most often in a mesh context with limited Internet connectivity.

Please don’t hesitate to contact us to ask for information about this project, we’d be happy to talk more about our plans :-)

1 minute video about SemanticXO

February 28, 2012 3 comments

The VU is making short videos of 1 minute to highlight some of the research that is being done within its walls. This is the video for SemanticXO, realised by Pepijn Borgwat and presented by Laurens Rietveld.

The script is in Dutch and is as follows:

  • Ik ben laurens rietveld en ik doe onderzoek aan de vrije universiteit naar semantische netwerken.
  • Ik wil iets vertellen over onderzoek van Christophe Gueret dat zich richt op laptops die in ontwikkelingslanden gebruikt worden.
  • Dit is de XO laptop, het is een goedkope stevige laptop die onderwijs bij kinderen moet bevorderen.
  • Op de laptop draait sugar, dat is een constructieve leeromgeving speciaal ontworpen voor jonge leerlingen.
  • Op dit moment blijven alle gegevens die gegenereerd worden in de leeromgeving, in de xo laptop. Als een gesloten kleine data doos.
  • Met dit onderzoek willen we data uitwisseling verbeteren door gebruik te maken van principes van het semantic web.
  • Op die manier kan de data, zoals berichten of tekeningen, gemakkelijk binnen kleine lokale netwerken worden verspreid.
  • Zodra 1 laptop met het netwerk verbonden is kan die lokala data delen met de buitenwereld.
  • Andersom kunnen gegevens van de rest van het internet, ook binnen het lokale netwerk worden gedeeld.

In case you don’t speak Dutch, you may find the following translation to be useful ;-)

  • My name is Laurens Rietveld and I do research on Semantic Networks at the Free University of Amsterdam.
  • I will tell you about the research of Christophe Guéret which concerns laptops being used in developing countries.
  • This is the laptop “XO”, it is a cheap and robust laptop used to support the education of kids.
  • The laptop runs “Sugar”, a constructionist learning environment especially designed for young learners.
  • Currently, all the data that is generated within the learning environment stays in the XO. Like if it was within a closed data silo.
  • With this research we aim at improving data sharing capabilities by using Semantic Web technologies.
  • In doing so, the data (for instance, messages or drawings) can be easily shared within a local network.
  • As soon as one of these laptop gets access to Internet, it becomes possible to share this data with the outside world too.
  • Vice versa, data from Internet can be downloaded and shared across the local network.

Does it scale?

Scaling is often a central question for data intensive projects, making use of Semantic Web technologies or not, and SemanticXO is no exception to that. The triple store is used as a back end for the Journal of Sugar, which is a central component recording the usage of the different activities. This short post discusses the results found for two questions: “how many journal entries can the triple store sustain?” and “how much disk space is used to store the journal entries?”

Answering these questions means loading some Journal entries and measuring the read and write performances along with the disk space used. This is done by a script which randomly generate Journal entries and insert them in the store. A text sampler and the real names of activities are used to make these entries realistic in terms of size. An example of such generated entry, serialised in HTML, can be seen there. The following graphs show the results obtained for inserting 2000 journal entries. These figures have been averaged over 10 runs, each of them starting with a freshly created store. The triple store used is called “RedStore“, it is called with an hash based BerkleyDB backend. The test machine is an XO-1 running the software 11.2.0.

The disk space is minimal for up to 30 entries, grows rapidly between 30 and 70 entries and continues on a linear basis from that number on. The maximum space occupied is a bit less than 100MB which is few of the 1GB of storage of the XO-1.

 

Amount of disk space used by the triple store

The results for the read and write delay are a bit less of a good news. Write operations are constant in time and always take around 0.1 s. Getting an entry from the triple store proves to get linearly slower as the triple store gets filled. It can be noticed that for up to 600 entries, the retrieval time of an entry is below a second. This should provide a reasonable response time. However, with 2000 entries stored the retrieval time goes as high as 7 seconds :-(

Read and write access time

The answer to the question we started with (“Does it scale?”) is then “yes, for up to 600 entries” considering a first generation device and the current status of the software components (SemanticXO/Redstore/…). This answers also yields new questions, among which: Are 600 entries enough for a typical usage of the XO? Is it possible to improve the software to get better results? How are the result on some more recent hardware?

I would appreciate a bit of help for answering all of these, and especially the last one. I only have an XO-1 and can not thus run my script on an XO-1.5 or XO-1.75. If you have such device and are willing to help me getting the results, please download the package containing the performance script and the triple store and follow the instructions for running it. After a day of execution or so, this script will generate three CSV files that I could then postprocess to get similar curves as the one showed.

Follow

Get every new post delivered to your Inbox.

Join 355 other followers

%d bloggers like this: