Open Access News

News from the open access movement


Friday, April 03, 2009

Some impediments to open data

Steven Wiley, Why Don't We Share Data? The Scientist, April 2009.  Excerpt:

There are so, so many reasons—and they make a lot of sense.

The most significant issue inhibiting data sharing is biologists' lack of motivation to do it.

We are constantly hearing suggestions to make all data gathered in biology experiments available online. This is an appealing idea because most data that we collect from experiments never sees the light of day. A smattering of our data appears in papers, of course, but we all recognize that this is usually a highly selected subset of all that is collected, intended to support the story that is being touted at the moment. If we could somehow make all of our data available to the community, the idea goes, biological progress would be greatly accelerated.

Despite the appeal of making all biological data accessible, there are enormous hurdles that currently make it impractical. For one, sharing all data requires that we agree on a set of standards. This is perhaps reasonable for large-scale automated technologies, such as microarrays, but the logistics of converting every western blot, ELISA, and protein assay into a structured and accessible data format would be a nightmare—and probably not worth the effort.

This does not mean that some instances of widespread data-sharing are not extraordinarily useful. However, these tend to be independent of a particular experimental context, the obvious example being DNA sequence or protein structure data. Some databases can also be very useful if the context is reasonably constrained. For example, tissue-specific expression profiles have proven useful, as have datasets gathered during different stages of development.

Unfortunately, most experimental data is obtained ad hoc to answer specific questions and can rarely be used for other purposes. Good experimental design usually requires that we change only one variable at a time. There is some hope of controlling experimental conditions within our own labs so that the only significantly changing parameter will be our experimental perturbation. However, at another location, scientists might inadvertently do the same experiment under different conditions, making it difficult if not impossible to compare and integrate the results.

The most significant issue inhibiting data sharing, however, is biologists' lack of motivation to do it. In order to sufficiently control the experimental context to allow reliable data sharing, biologists would be forced to reduce the plethora of cell lines and experimental systems to a handful, and implement a common set of experimental conditions. Getting biologists to agree to such an approach is akin to asking people to agree on a single religion. If you're still not convinced, consider the experience of the Alliance for Cell Signaling (AfCS)....