Home > Linked Open Data > CKAN->network exporter for the LOD Cloud

CKAN->network exporter for the LOD Cloud

The LOD cloud as rendered by Gephi

One year ago, we posted on the LarkC blog a first network model of the LOD cloud. Network analysis software can highlight some aspects of the cloud that are not directly visible otherwise. In particular, the presence of dense sub-groups and several hubs – whereas in the classical picture, DBPedia is easily perceived as being the only hub.

Computing network measures such as centralities, clustering coefficient or the average path length can reveal much more about the content of a graph and the interplay of its nodes. As shown since that blog post, these information can be used to appreciate the evolution of the Web of Data and devise actions to improve it (see the WoD analysis page for more information about our research on this topic). Unfortunately, the picture provided by Richard and Anja on lod-cloud.net can not be fitted directly into a network analysis software which expects a .net or CSVs files instead. Fortunately, thanks to the very nice API of CKAN.net it is easy to write a script generating such files. We made such a script and thought it would be a good idea to share it🙂

The script is hosted on GitHub. It produces a “.net” file according to the format of Pajek and two CSV files, one for the nodes and one for the edges. These CSV can then easily be imported into Gephi, for instance, or any other software of your choice. We also made a dump of the cloud as of today and packaged the resulting files.

Have fun analysing the graph and let us know if you find something interesting😉

 

  1. June 30, 2011 at 18:12

    GitHub script bails with:

    Add fu-berlin-stitch
    -> bio2rdf-pubchem
    -> dbpedia
    -> bio2rdf-chebi
    Traceback (most recent call last):
    File “main.py”, line 199, in
    main()
    File “main.py”, line 129, in main
    node = get_package(package,conn)
    File “main.py”, line 67, in get_package
    triples = int(data[‘triples’])
    ValueError: invalid literal for int() with base 10: ‘992,797’

  2. June 30, 2011 at 20:43

    Thanks for the feedback! I’m sorry it doesn’t work for you, I’ll fix that as fast as possible.

    • June 30, 2011 at 21:04

      no problem. The data probably changed since you last tried. nice script!

  3. July 1, 2011 at 09:34

    Fixed! Thanks again for the bug report🙂

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s