Open Access News

News from the open access movement


Wednesday, January 07, 2009

Unlocking the precursors to knowledge

John Wilbanks, Beliefs, Knowledge, Articles, Databases, Common Knowledge, January 5, 2009.  Excerpt:

...[O]n the web, we have these things that are kind-of-knowledge. Databases. Journal articles. Web pages. Ontologies.

Taken together, these things are somewhere in the epistemological chain. But the act of digitizing them does some strange things...they start to form an observable, computable network, a knowledge web of sorts. And in a knowledge web, we have to understand a important conceptual transformation that knowledge itself needs to be treated as something similar to software, something upon which computing happens and depends - and the implications of that transformation....

But knowledge is different, as the vast majority of the canon is already embedded in creative works protected by copyrights. Thus, we have to unlock some content if we're going to reformat it into something that can in turn be treated as an interim step along the way to knowledge, and then used as cyberinfrastructure. This is why Open Access is so crucial. Whatever knowledge is, a lot of it is locked behind paywalls, copyright licenses, or trapped in lousy formats from a machine perspective.

But - if we have access - if we can take the individual facts described in papers and turn them into modelable knowledge, or at least precursors to knowledge, we convert those facts into infrastructure for construction into something bigger, for composition into structures that software can use.

This transformation is already under way in the life sciences. Most of the valuable CI [cyberinfrastructure] data in the life sciences has been hand-curated out of journal articles into more structured sources like the Kyoto Encyclopedia of Genes and Genomes, or the Human Protein Reference Database, or the Information Hyperlinked Over Proteins, and on and on.

This needs to be accelerated and industrialized, as the human-readable paper is the least valuable format of knowledge from a cyberinfrastructure/CI perspective....

But this requires an understanding of access to the knowledge canon as a fundamental lever of CI construction in a knowledge web. Unfortunately most of these databases tend to have copyright or contractual restrictions that make it impossible to build on them as infrastructure (particularly non-commercial restrictions or restrictions on redistribution in federated or integrated knowledgebases). That's why open access to databases is essential as well.

We are lucky to have vast amounts of public domain databases that are, from a CI perspective, un-networked. The scientist needs to open a dozen or more tabs in a browser and use her own mind to integrate the results. That's lousy. But it's a natural outcome of the web not integrating databases the way it integrates documents, and at least the legal terms let us start to integrate....