Home > Linked Open Data > 5-stars Linked Open Data pays more than Open Data

5-stars Linked Open Data pays more than Open Data

Let’s assume you are the owner of a CSV file with some valuable data. You derive some revenue from it by selling it to consumers that do traditional data integration. They take your file and import it into their own data storage solution (for instance, a relational database) and deploy applications on top of this data store.

Traditional data integration

Data integration is not easy and you’ve been told that Linked Open Data facilitates it so you want to publish your data as 5-star Linked Data. The problem is that the first star speaks about “Open license” (follow this link for an extensive description of the 5-star scheme) and that sounds orthogonal to the idea of making money with selling the data:-/

If you publish your CSV as-is, under an open license, you get 3-stars but don’t make money out of serving it. Trying to get 4 or 5 stars means more effort from you as a data publisher and will cost you some money, still without earning you back any…

Well, let’s look at this 4th star again. Going from 3 stars to 4 means publishing descriptions of the entities in the Web. All your data items get a Web page on their own with the structured data associated to them. For instance, if your dataset contains a list of cities with their associated population every of this city as its own URI with the population indicated in it. From that point, you get the 5th star by linking these pages to other pages published as Linked Open Data.

Roughly speaking, your CSV file is turned into a Web site and this is how you can make money out of it. Like for any website, visitors can look at individual pages and do whatever they want with them. They can not however dump the entire web site into their machine. Those interested in getting all the data can still buy it from you, either as a CSV or RDF dump.

Users of your data have the choice between two data usage process: use parts of the data through the Linked Open Data access or buy it all, and integrate it. They are free to choose the best solution for them depending on their needs and resources.

Using Linked Open Data

Some added side bonuses of going 5-star instead of sticking at 3:

  • Because part of the data is open for free, you can expect to get more users screening it and reporting back errors;
  • Other data publishers can easilly link their data set with yours by re-using the URIs of the data items. This increases the value of the data;
  • In its RDF format, it is possible to  add some links within the data set. Thereby doing part of the data integration work on the behalf of the data consumers – who will be grateful for it!
  • Users can deploy a variety of RDF-enabled tools to consume your data in various ways;

Sounds good, doesn’t it? So, why not publishing all your 3-star data as 5-star right away?😉

  1. August 1, 2012 at 13:45

    I agree with several of the points you make, but I’m not sure I understand why should I pay for a download of something I can crawl for free? What stops me from mirroring your website (and hence your data) using wget or other tools?

    You could think of a system to restrict many requests form a IP but (1) your system gets more complicated and (2) you may restrict several legitimate users behind one same IP.

    • August 1, 2012 at 13:54

      Thanks for your feedback. I didn’t mention it but restricting the requests to the site to prevent it from being mirrored is an important element of the solution. Otherwise, indeed, somebody could just crawl the data set. However, it is not that of a big constraint as many web site already make use of access control if only to prevent them from DoS attacks.

  2. August 1, 2012 at 14:25

    I think that solution will work depending on the value of your data over the time. If you have information, say, about stock options, that will work. If you data is highly valuable over long periods of time (say, information about drug effects from a pharmaceutical company) and the value of buying a RDF dump or CSV is too high it will still make sense to mirror the site, even if it is at a slow pace.

  3. August 1, 2012 at 14:39

    Yes, unless there is something that limits mirroring capabilities. Also one could consider selling services aside to data dump. Compared to crawling the site at a very low rate and get no support for what you get, buying a data dump would provide you will all you need much faster and some help for making use of the data.

  4. August 2, 2012 at 04:05

    @alvarograves — Linked Data is akin to DNS for Data. Basically, it uses names to map out the underlying data substrate for the information super highway exposed by the Web.

    Bearing in mind my comments above, you can publish Linked Data and use Access Control Lists (ACL) based rules to constrain access to users and their user agents.

    Key to all of this is Web-scale verifiable identity which is what WebID (a nifty application of Linked Data) and its authentication protocol are all about.

    I provide a detailed response in the LOD mailing list thread at: http://bit.ly/Mdp4tO .

    Kingsley

  5. August 2, 2012 at 14:55

    I agree that WebID are the next thing to aim at. Using them, it becomes easier to track who dereferences what and apply some personalised restrictions. As a lighter, but also less powerfull/secure, alternative using an API key that is sent along with the requests allows for some user-centered control of resources access.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: