Open Access News

News from the open access movement


Saturday, April 04, 2009

Data sharing is practical and valuable now, and will get better

Bill Hooker, Why don't we share data? Not for the reasons Steven Wiley thinks we don't, Open Reading Frame, April 4, 2009.  A response to Wiley's article in The Scientist (blogged here yesterday).  Excerpt:

I disagree with the author, PNNL's Steven Wiley, on a number of points:

Despite the appeal of making all biological data accessible, there are enormous hurdles that currently make it impractical. For one, sharing all data requires that we agree on a set of standards. This is perhaps reasonable for large-scale automated technologies, such as microarrays, but the logistics of converting every western blot, ELISA, and protein assay into a structured and accessible data format would be a nightmare -- and probably not worth the effort.

Wiley is making two mistakes here: setting the perfect against the good, and vastly underestimating human ingenuity.

Standards are inarguably required for automated sharing and essential for the sharing of ALL data, but that doesn't mean that sharing SOME data, with evolving standards or even without any standards, has no utility....

This leads me to the second mistake. It seems odd to me to insist that because standards are difficult to develop and implement, the bulk of such work is futile. The key is the phrase "currently... impractical". The whole concept of the internet was probably considered "currently impractical" by a great many people, until someone went and built it....

Moreover, I am not the only one who disagrees about the value of creating standards for difficult-to-share data. If you think western blots would be a nightmare, how about biodiversity data -- like, say, museum specimens? How about anthropometric data, exchangeable biomaterials, neuroscience data, electron micrographs, magnetic resonance images or microscopy images? The MIBBI project has dozens of other examples, the Open Biomedical Ontologies Foundry is working on dozens more, and Bioformats.org might offer a lightweight solution to some of the same problems....

I cannot begin to imagine how to build semantic and exchange standards for those kinds of data, but I'm not about to bet against the people currently trying to do so; nor do I believe that, once built, their systems will prove to have been "not worth the effort"....

Wiley goes on to say:

Unfortunately, most experimental data is obtained ad hoc to answer specific questions and can rarely be used for other purposes.

which is just plain wrong. Much of the rationale for data sharing, the engine of much of its promise, is the simple observation that you cannot know what someone else will do with your data, particularly when they have access to lots of other people's data to go with it....