Archive

Posts Tagged ‘olpc’

First release of SemanticXO!

Here it is: the first fully featured release of SemanticXO! Use it in your activities to store and share any kind of structured information with other XOs. The installation procedure is easy and only requires and XO-1 running the operating system version 12.1.0. Go to the GIT repository and download the files “setup.sh” and “semanticxo.tar.gz” somewhere the XO (these files are in the directory “patch_my_xo”). Then, log in as root and execute “sh setup.sh setup”. The installation package will copy the API onto the XO, setup the triple store and install two demo activity. Once the procedure is complete, reboot the XO to activate everything.

The XO after the installation of SemanticXO

There are two demo activities which are described in more details on the project page. Under the hood SemanticXO provides an API to store named graphs containing description of one or several resources. These named graphs are marked with an author name, a modification date and, eventually, a list of other devices (identified by their URI) to share the graph with. This data is used by a graph replication daemon which every 5 minutes browse the network using Avahi, find other triple stores, and download a copy of the graphs that are shared with it. The data backend of the mailing activity provides a good example of how the API is used.

Advertisements

Downscaling Entity Registries for Poorly-Connected Environments

VeriSign logo

VeriSign logo (Photo credit: Wikipedia)

Emerging online applications based on the Web of Objects or Linked Open Data typically assume that connectivity to data repositories and entity resolution services are always available. This may not be a valid assumption in many cases. Indeed, there are about 4.5 billion people in the world who have no or limited Internet access. Many data-driven applications may have a critical impact on the life of those people, but are inaccessible to those populations due to the architecture of today’s data registries.

Examples of data registries include the domain name registries. These are databases containing registered Internet domain names. They are necessary for all Web users wishing to visit a website knowing its URL (e.g. https://semweb4u.wordpress.com) rather than its IP address (e.g. http://76.74.254.120). Another example of data registry is the Digital Object Architecture (DOA) which assigns unique identifiers to digital objects (e.g. scientific publications).

Registries are critical components of today’s Internet architecture. They are widely used in every-day Web activities but their usage is severely impaired in poorly connected or ad-hoc environments. In this context, centralized data management – as typically used by current data registries – is of limited practicability, if only possible in the first place. There is a need for hybrid models mixing decentralized and hierarchical infrastructures to support data-driven application in environments with limited Internet connectivity.

Philippe Cudré-Mauroux and myself, received a $200,000 research grant from VeriSign Inc. (PDF version) to investigate such novel approaches for data registries. During this 12 months project, we will develop decentralized solutions to the problems of entity publication, search, de-duplication, storage and caching. A running prototype will be tested on the XO laptop, a laptop used by young learners in developing countries – most often in a mesh context with limited Internet connectivity.

Please don’t hesitate to contact us to ask for information about this project, we’d be happy to talk more about our plans 🙂

1 minute video about SemanticXO

The VU is making short videos of 1 minute to highlight some of the research that is being done within its walls. This is the video for SemanticXO, realised by Pepijn Borgwat and presented by Laurens Rietveld.

The script is in Dutch and is as follows:

  • Ik ben laurens rietveld en ik doe onderzoek aan de vrije universiteit naar semantische netwerken.
  • Ik wil iets vertellen over onderzoek van Christophe Gueret dat zich richt op laptops die in ontwikkelingslanden gebruikt worden.
  • Dit is de XO laptop, het is een goedkope stevige laptop die onderwijs bij kinderen moet bevorderen.
  • Op de laptop draait sugar, dat is een constructieve leeromgeving speciaal ontworpen voor jonge leerlingen.
  • Op dit moment blijven alle gegevens die gegenereerd worden in de leeromgeving, in de xo laptop. Als een gesloten kleine data doos.
  • Met dit onderzoek willen we data uitwisseling verbeteren door gebruik te maken van principes van het semantic web.
  • Op die manier kan de data, zoals berichten of tekeningen, gemakkelijk binnen kleine lokale netwerken worden verspreid.
  • Zodra 1 laptop met het netwerk verbonden is kan die lokala data delen met de buitenwereld.
  • Andersom kunnen gegevens van de rest van het internet, ook binnen het lokale netwerk worden gedeeld.

In case you don’t speak Dutch, you may find the following translation to be useful 😉

  • My name is Laurens Rietveld and I do research on Semantic Networks at the Free University of Amsterdam.
  • I will tell you about the research of Christophe Guéret which concerns laptops being used in developing countries.
  • This is the laptop “XO”, it is a cheap and robust laptop used to support the education of kids.
  • The laptop runs “Sugar”, a constructionist learning environment especially designed for young learners.
  • Currently, all the data that is generated within the learning environment stays in the XO. Like if it was within a closed data silo.
  • With this research we aim at improving data sharing capabilities by using Semantic Web technologies.
  • In doing so, the data (for instance, messages or drawings) can be easily shared within a local network.
  • As soon as one of these laptop gets access to Internet, it becomes possible to share this data with the outside world too.
  • Vice versa, data from Internet can be downloaded and shared across the local network.

Does it scale?

Scaling is often a central question for data intensive projects, making use of Semantic Web technologies or not, and SemanticXO is no exception to that. The triple store is used as a back end for the Journal of Sugar, which is a central component recording the usage of the different activities. This short post discusses the results found for two questions: “how many journal entries can the triple store sustain?” and “how much disk space is used to store the journal entries?”

Answering these questions means loading some Journal entries and measuring the read and write performances along with the disk space used. This is done by a script which randomly generate Journal entries and insert them in the store. A text sampler and the real names of activities are used to make these entries realistic in terms of size. An example of such generated entry, serialised in HTML, can be seen there. The following graphs show the results obtained for inserting 2000 journal entries. These figures have been averaged over 10 runs, each of them starting with a freshly created store. The triple store used is called “RedStore“, it is called with an hash based BerkleyDB backend. The test machine is an XO-1 running the software 11.2.0.

The disk space is minimal for up to 30 entries, grows rapidly between 30 and 70 entries and continues on a linear basis from that number on. The maximum space occupied is a bit less than 100MB which is few of the 1GB of storage of the XO-1.

 

Amount of disk space used by the triple store

The results for the read and write delay are a bit less of a good news. Write operations are constant in time and always take around 0.1 s. Getting an entry from the triple store proves to get linearly slower as the triple store gets filled. It can be noticed that for up to 600 entries, the retrieval time of an entry is below a second. This should provide a reasonable response time. However, with 2000 entries stored the retrieval time goes as high as 7 seconds 😦

Read and write access time

The answer to the question we started with (“Does it scale?”) is then “yes, for up to 600 entries” considering a first generation device and the current status of the software components (SemanticXO/Redstore/…). This answers also yields new questions, among which: Are 600 entries enough for a typical usage of the XO? Is it possible to improve the software to get better results? How are the result on some more recent hardware?

I would appreciate a bit of help for answering all of these, and especially the last one. I only have an XO-1 and can not thus run my script on an XO-1.5 or XO-1.75. If you have such device and are willing to help me getting the results, please download the package containing the performance script and the triple store and follow the instructions for running it. After a day of execution or so, this script will generate three CSV files that I could then postprocess to get similar curves as the one showed.

Is data sharing the privilege of a few?

Update: The following idea got the first prize of the “outrageous ideas” track of ISWC2011! Many thanks to everyone who voted for it 🙂

Over the last couple of years, we have engineered a fantastic data sharing technology based on open standards from the W3C: Linked Data. Using Linked Data, it is possible to express some knowledge with a set of facts and connect the facts together to build a network. Having such networked data openly accessible is a source of economical and societal benefits. It enables sharing data in an unambiguous, open and standard way, just as the Web enabled document sharing. Yet, the way we designed it deprives the majority of the World’s population from using it.

Doing “Web-less” Linked Data?

The problem may lay in the fact that Linked Data is based on Web technologies, or in the fact that Linked Data have been designed and engineered by individuals having an easy access to the Web, or maybe a combination of both aspects. Nowadays, Linked Data rhymes with having a Cloud hosted data storing services, a set of (web-based) applications to interact with this service and the infrastructure of the Web. As a result, if you don’t have access to this Web infrastructure, you can not use Linked Data. Which is a pity, because an estimated 4.5B persons don’t have access to it for various reasons (lack of infrastructure, cost of access, literacy issues, …). Wouldn’t it be possible to adjust our design choices to ensure they could also benefit from Linked Data, even if they don’t have the Web? The answer is yes, and the best news is that it wouldn’t be that hard either. But for it to happen, we need to adapt both our mindset and our technologies.

Changing our mindset

We have tendency to think that any data sharing platform is a combination of a cloud based data store, some client applications to access the data and form to feed new data into the system. This is not always applicable as central hosting of data may not be possible, or its access from client applications may not be guaranteed. We should also think of the part of the World which is illiterate and for which Linked Data, and the Web, are not accessible. In short, we need to think de-centralised, small and vocal in order to widen the access to Linked Data.

Think de-centralised

Star-shaped networks can be hard to deploy. They imply setting a central producer of resource somewhere and connecting all the clients to it. Electricity networks have already found a better alternative: the microgrids. Microgrids are made of small networks of producers/consumers (the “prosumers”) of electricity that locally manage the electricity needs. We could, and should, copy on this approach to manage local data production and consumption. For example, think of a decentralised DBpedia whose content would be made of the aggregation of several data sources producing part of the content – most likely, the content that is locally relevant to them.

Think small

Big servers require more energy and more cooling. They usually end up racked into big cabinets that in turn are packed into cooled data centers. These data centers needs to be big in order to cope with the scale issues. Thinking decentralised allow to think small, and we need to think small to provide alternatives to having data centers where these are not available. As the content production and creation goes decentralised, several small servers can be used. To continue with the analogy with microgrids, we can name these small servers taking care of locally relevant content “micro-servers”.

Think vocal

Unfortunately, not everyone can read and type. In some African areas, knowledge is shared using vocal channels (mobile phone, meetings, …) because there is no other alternative. Getting access to knowledge exchanged that way can not be done using form based data acquisition systems. We need to think of exploiting vocal conversation through Text To Speech (TTS) and Automatic Speech Recognition (ASR) rather than staying focused on forms.

Changing our technologies

Changing the mindsets is not enough, if we aim at stripping down the Web from Linked Data we also need to pay attention to our technologies and adapt them. In particular, there are 5 upcoming challenges that can be phrased as research questions:

  1. Dereferencability: How do you get a route to the data if you want to avoid using the routing system provided by the Web? For instance, how do you dereference an host-name based URIs if you don’t have access to the DNS network?
  2. Consistency: In a decentralised setting where several publishers produce part of a common data set, how do you ensure URIs are re-used and non colliding? There are chances that two different producers would use the same URI to describe different things.
  3. Reliability: Unlike centrally hosted data servers, micro-servers can not be asked to provide a 99% availability. They may go on and off unexpectedly. First thing to know is whether that’s an issue or not. The second is, if we should ensure their data remains available, how do we achieve this?
  4. Security: That’s also related to having a swarm of microservers serving a particular dataset. If any microserver can produce a chunk of that dataset, how do you avoid having a spammer getting in and starting producing falsified content? If we want to avoid centralized networks, authority based solution such as in Public Key Infrastructure (PKI) is not an option. We need to find decentralised authentication mechanisms.
  5. Accessibility: How do we make Linked Data accessible to those that are illiterate? As highlighted earlier, not everyone can read an write but illiterate persons can still talk. We need to take more of the vocal technologies into account in order to make Linked Data accessible to them. We can also investigate graphical based data acquisition techniques with visual representations of information.

More about this

This is a presentation that Stefan Schlobach gave at ISWC2011 on this topic:

You are also invited to read the associated paper “Is data sharing the privilege of a few ? Bringing Linked Data to those without the Web” and check out two projects working on the mentioned challenges: SemanticXO and Voices.

Updates about SemanticXO

With the last post about SemanticXO dating back from April, it’s time for an update, isn’t it? 😉

A lot of things happened since April. First, a paper about the project was accepted for presentation at the First International Conference on e-Technologies and Networks for Development (ICeND2011). Then, I spoke about the project during the symposium of the Network Institute as well as during the SugarCamp #2. Lastly, a first release of a triple-store powered Journal is now available for testing.

Publication

The paper entitled “SemanticXO : connecting the XO with the World’s largest information network ” is available from Mendeley. It explains what the goal of the project is and then report on some performance assessement and a first test activity. Most of the information contained has actually been blogged before on this blog (c.f. there and there) but if you want a global overview of the project, this paper is still worth a read. The conference in itself was very nice and I did some networking. I came back with a lot of business card and the hope of keeping in touch with the people I met there. The slides from the presentation are available from SlideShare

Presentations

The Network Institute of Amsterdam organised on May 10 a one-day symposium to strengthen the ties between its members and to stimulate further collaboration. This institute is a long-term collaboration between groups from the Department of Computer Science, the Department of Mathematics, the Faculty of Social Sciences and the Faculty of Economics and Business Administration. I presented a poster about SemanticXO and an abstract went into the proceedings of the event.

More recently, I spent the 10 and the 11 of September at Paris for the Sugar Camp #2 organised by OLPC France. Bastien managed me a bit of time on Sunday afternoon to re-do the presentation from ICeND2011 (thanks again for that!) and get some feedback. This was a very well organised event held at a cool location (“La cité des sciences“), it was also the first time I met so many other people working on Sugar and I could finally put some faces on the name I saw so many time on the mailing lists and on the GIT logs 🙂

First SemanticXO prototype

The project developement effort is split in 3 parts: a common layer hidding the complexity of SPARQL, a new implementation of the journal datastore and the coding of diverse activities making use of the new semantic capabilities. All three are going more or less in parallel, at different speed, as, for instance, the work on activities direct what the common layer will contain. I’ve focused my efforts on the journal datastore to get something ready to test. It’s a very first prototype that has been coded starting with the genuine datastore 0.92 and replacing the part in charge of the metadata. The code taking care of the files remains the same. This new datastore is available from Gitorious but because installing the triple store and replacing the journal is a tricky manual process, I bundled all of that 😉

Installation

The installation bundle consists of two files, a “semanticxo.tgz” and a script “patch-my-xo.sh“. To install SemanticXO, you need to download the two and put them in the same location somewhere on your machine and then type (as root):

sh ./patch-my-xo.sh setup

This will install a triple store, add it to the daemons to start at boot time and replace the default journal by one using the triple store. Be careful to have backups if needed as this will remove all the content previously stored in the journal! Once the script has been executed, reboot the machine to start using the new software.

The bundle has been tested on an XO-1 running the software release 11.2.0 but it should work on any software release on both the XO-1 and XO-1.5. This bundle won’t work on the 1.75 has it contains a binary (the triple store) not compiled for ARM.

What now?

Now that you have the thing installed, open the browser and go to “http://127.0.0.1:8080”. You will see the web interface of the triple store which allows you to make some SPARQL queries and see which named graphs are stored. If you are not fluent in SPARQL, the named graph interface is the most interesting part to play with. Every entry in the journal gets its own named graph, after having populated the journal with some entries you will see this list of named graphs growing. Click on one of them and the content of the journal entry will be displayed. Note that this web interface is also accessible from any other machine on the same network as the XO. This yields new opportunities in terms of backup and information gathering: a teacher can query the journal of any XO directly from a school server, or an other XO.

Removing

The patch script comes with an uninstall function if you want to revert the XO to its original setup. To use it, simply type (as root):

sh ./patch-my-xo.sh remove

and then reboot the machine.

Status of SemanticXO

Wayan recently blogged about the project SemanticXO, asking about its current status. Unfortunately, I couldn’t comment on his blog so I’d like to answer to his question here. Daniel also emitted some doubts about the Semantic Web, so I’ll try to clarify what this is all about.

To be honest, I’m not sure what that really means. Is this a database project? Is it to help translation of the Sugar User Interface? Or are children somehow to use SemanticXO in their language acquisition?

Semantic technologies are knowledge representation tools used to model factual information – for instance, “Amsterdam,isIn,Netherlands”. These facts are stored in optimised databases called the triple stores. So, yes, it is kind of a data base project which aims at installing such a triple store and provide an API for using it. The technologies developed for the Semantic Web are particularly suited to storing and querying multi-lingual data, thus activities that need to store text in different languages would directly benefit from this feature. The triple store could indeed eventually be used instead of the .po files to store multi-lingual data for Sugar.

The goal of SemanticXO is not only to provide an API to use a triple store on the XO but also to provide access to the data published using Semantic Web technologies. There has been many data sets being published on the Web, providing a network with more than 27 Billion factual information that can be queried and combined. Although not being exhaustive, the Linked Open Data (LOD) cloud provides a good idea of the amount of data out there. With SemanticXO an activity developer will be able to simply get the population of Amsterdam, or the exact location of Paris, or the population of London, or whatever. The LOD cloud can be queried just like a database and it contains a lot of information about many topics. And because the XO will itself be able to use the same publication system, the kids using Sugar will be able to publish their data on the cloud directly from an activity.

Currently, it is hard, if not impossible, to get such atomic information and just insert it somewhere into an activity with a few lines of code…

Regardless of its purpose, it seems that SemanticXO development has come to a halt. The only other post from Christophe Guéret detailed RedStore running on the XO, where he noted the challenges of installing a TripleStore on an XO using RedStore, namely that RedStore depends on some external libraries that are not yet packaged for Fedora11 and since it’s not so easy to compile directly on the XO, a second computer is required.

This post was published on the 11 of April 2011. To date, there were three posts about SemanticXO: the introduction (posted on December 15, 2010), the installation of a triple store (posted on December 20, 2010) and a first activity using the triple store (posted on April 5, 2011). So there was one other post made since the installation of the triple store. But that first step of installing a triple store was indeed important for what I want to do with SemanticXO and it was not easy to find one that would fit the low specs of an XO-1. Then, the installation was a bit challenging because of the dependencies but nothing really exceptional there. Ideally, the triple store will come installed by default on the OLPC OS releases some day 🙂

Once installed, the XO didn’t return queries quickly. The XO failed on a number of benchmark different triple stores, even after being executed over a full night.

I was pleased, surprised and relieved to see that the triple store worked in the first place! From what I know, it was the first time a triple store was running on such low-spec hardware and I wanted to see how far I could push it. So I loaded a significant amount of triples (50k) and ran of the testing suite we typically use to test triple store performances. As expected, the response time was long and most complex queries just failed. But these evaluation systems are aimed at testing big triple stores on big hardware and the queries are designed to see how the triple store deal with extreme cases. Considering that on the oldest generation of XO the triple store managed to answer queries way more complex that the one it is expected to deal with, I found the results acceptable and decided to move onto the next steps.

So Christophe, what does this mean? Is a Semantic Web for children using the XO possible?

Yes, it is possible and I’m still actively working on it! The developement is going slower than I would like it to go, as many contributors I work on this project on my spare time, but it is going on. The last post on this blog shows an activity using the store for its internal data and contains a pointer to a technical report that, I hope, will bring more light onto the project goals & status. Right now, I’m working on extending this activity and implementing an drop-in replacement for the data store that would use the triple store to store metadata about the different entries. This clustering activity is only showing how activities in Sugar can store data using the triple store so I’m also working on an activity that will show the other aspect: how the same concepts can be used to get data from the LOD cloud and display it.

I have been able to detect no clear correlation between use of the term “Semantic Web” and knowledge of what it means. I think everybody just read it in Wired in 1999 and filed it away as a really good thing to put on a square of your Buzzword Bingo card.

Since 1999, and until some years ago, the Semantic Web has been searching for its own identity and meaning. It started out as a vision of having data being published on the Web just as the Web as we know it allows for the publication of Documents. Translating a vision into concrete technologies is a lengthy process subject of debates and trial&errors phases before you get into something everyone can see and play with. Now, we are getting on track with data sets being published on the Web using Semantic Web technologies (the LOD cloud, Linked Open Commerce), some dedicated high-end conferences (ISWC, ESWC, SemTech, …) and journals (JWS, SWJ, …). Outside of academia, there is also an increasing amount of Semantic Web application but most of it is invisible to the end user. Have you noticed Facebook is using Semantic Web technologies to mark up the pages for its famous “Like” button? Or that the NYTimes uses the same technologies to tag its articles? and these are only two example out of many more.

As highlighted by Tom Ilube from Garlik (an other company using Semantic Web technology), the Semantic Web is a change in the infrastructure of the Web itself that you won’t even see happening.

%d bloggers like this: