Bakfiets by Anomalily on Flickr
One visiting the Netherlands will inevitably stumble upon some “BakFiets” in the streets. This Dutch speciality that seems to be the result from cross-breeding a pick-up with a bike can be used from many things from getting the kids around to moving a fridge.
Now, let’s consider a Dutch bike shop that sells some Bakfiets among other things . In his information system these item will surely be labelled as “Bakfiets” because this is just what they are. This information system can also be expected to be globally be filled with inputs and semantics (table names, fields names, …) in Dutch as well. If that bike shop wants to start selling his items outside of the Netherlands there will be a need for exporting the data into some international standard so that other sellers can re-import the data it into their own information system. This is where things get problematic…
What will happens to the “bakfiets” during the export? As it does not make sense to define an international level class “bakfiets” – which can be translated to “freight bicycle“, every shop item of type “bakfiets” will most certainly be exported as being a item of type “bike”. If the Dutch shop owner is lucky enough the standard may let him indicate that, no, this is not really just a two-wheels standard bike through a comment property. But even if the importer may be able to use that comment (which is not guaranteed), the information is lost: when going international, every “bakfiets” will become a regular bike. Even more worrying is the fact that besides the information loss there is no indication of how much of it is gone.
When the data is exported from one system and re-imported into another specificity is lost
Semantic Web technologies can be of help here by enabling the qualification of shop items with facets rather than strict types. That is assigning labels or tags to things instead of putting items into boxes. The Dutch shop will be able to express in is knowledge system that his bikes with a box are both of the specific type “bakfiets” that makes sense only in the Netherlands and are also instances of the international type “bike”. An additional piece of information present in the knowledge base will connect the two types saying the the former is a specification of the later. The resulting information export flow is as follows:
- The Dutch shop assign all the box-bikes to the class “bakfiets” and the regular bikes to the class “bike”.
- A “reasoner” infers that because all the instances of “bakfiets” are specific types of “bike”, all these items are also of type “bike”.
- Another non Dutch shop asking for instances of “bike” in the Dutch shop will get a complete list of all the bikes and see that some of them are actually of type “bakfiets”.
- If his own knowledge system does not let him store facets the importers will have to flatten the data to one class but he will have received the complete information and know how much of it will be lost by removing facets.
The data shared has different facets out of which the data importer can make a choice
Beyond this illustrative example data export presents real issues in many cases. Everyone usually want to express their data using the semantic that applies to them and have to force information into some other conceptualisation framework when this data is shared. A more detailed case for research data can be found in the following preprint article:
- Christophe Guéret, Tamy Chambers, Linda Reijnhoudt, Frank van der Most, Andrea Scharnhorst, “Genericity versus expressivity – an exercise in semantic interoperable research information systems for Web Science”, arXiv preprint http://arxiv.org/abs/1304.5743, 2013