I hope to come back to various points raised in my presentation over time, but right now I want to focus on one area that has sparked a good deal of debate (such as here, here and here with much twittering too). Right in the middle of the presentation I offered three conjectures, the first of which was data outlasts code which lead me to then assert that therefore open data is more important than open source. This appears to be controversial. ...
My point was that code is tied to processes usually embodied in hardware whereas data is agnostic to the hardware it resides on. The audience at the conference understand this already: they are archivists and librarians and they deal with data formats like MARC which has had superb longevity. Many of them deal with records every day that are essentially the same as they were two or three decades ago. Those records have gone through multiple generations of code to parse and manipulate the data. ...
It’s true that you need code to access data, but critically it doesn’t have to be the same code from year to year, decade to decade, century to century. Any code capable of reading the data will do, even if it’s proprietary. You can also recreate the code whereas the effort involved in recreating the data could be prohibitively high. ...
Here’s the central asymmetry that leads me to conclude that open data is more important than open source: if you have data without code then you could write a program to extract information from the data, but if you have code without data then you have lost that information forever. ...
Of course we want open standards, open source and open data. But in one or two hundred years which will still be relevant? Patents and copyrights on formats expire, hardware platforms and even their paradigms shift and change. Data persists, open data endures.
The problem we have today is that the open data movement is in its infancy when compared to open source. We have so far to go, and there are many obstacles. One of the first steps to maturity is to give people the means to express how open their data is, how reusable it is. The Open Data Commons is an organisation explicitly set up to tackle the problem of open data licensing. If you are publishing data in any way you ought to check out their licences and see if any meet with your goals. If you licence your data openly then it will be copied and reused and will have an even greater chance of persisting over the long term. ...
Posted by
Gavin Baker at 3/05/2009 02:16:00 PM.
The open access movement:
Putting peer-reviewed scientific and scholarly literature
on the internet. Making it available free of charge and
free of most copyright and licensing restrictions.
Removing the barriers to serious research.
I recommend the OA tracking project (OATP) as the best way to stay on top of new OA developments. You can read the OATP feed on a blog-like web page or subscribe to it by RSS, email, or Twitter. You can also help build the feed by tagging new developments you encounter.