Open Access News

News from the open access movement


Saturday, June 17, 2006

Open data, mashups, and re-usefulness

Eric K. Neumann, Freeing Data, Keeping Structure, BioIT, June 14, 2006. Excerpt:

Mashups illustrate the proposition that data need to not be too dependent on any single application. Typically, the phrase “data interoperability” is used to describe this, and several methodologies such as SOA attempt to address this issue. However, I will go one step further and suggest that recombinant data must be captured and defined in a way that is “application independent,” being free of any application formatting biases so that it has value on its own....Data should have strong value even for future applications that were not considered when the data was created, a property referred to as “re-usefulness.”...

Science Commons...executive director John Wilbanks hopes that by adding the ability to tag the legal use and distribution of knowledge and data, Web-based science resources can be guaranteed to be openly available for both academic and commercial R&D, and thereby promote innovation in science by lowering the legal and technical costs of the sharing and reuse of scientific work. Science Commons’ vision fits very well to the goals of Public Library of Science (PLoS), which aims to empower the science community by making published knowledge more accessible. Since rights cover data as well, it can also take full advantage of the RDF model for its representation. Such an experiment has been initiated for Science Commons’ NeuroCommons project....

Where is this taking us? Well we have hardly explored how to do a scientific mashup or what it means to take advantage of it. One thing is clear: If it is based on recombinant scientific data, a data description language such as RDF is necessary. Otherwise, the mashups will result in mush, unusable piles of unparsable data with unknown provenance! One of the project areas people are currently discussing is around a Neuroscience Mashup, where complex sets of data could all be co-joined by tissue locality as defined by a brain coordinate system. Data about neurological disorders, neurotransmitter receptor-types, neural functions, nerve fiber projections, and gene expression could all be co-registered for very powerful analyses and viewing. This scientific mashup would allow collaborating researchers to ask: “What genes are affected in responsive neural cells targeted by p38 inhibitors, and do these same cells go on to form amyloid plaques in Alzheimer’s-affected individuals?” Conceivably, this could have astounding benefits for research and medicine, but we’ll need to begin with a few incremental yet provocative demonstrations.