Open Access News

News from the open access movement


Tuesday, April 21, 2009

What data should publishers make OA?

David Shotton, Semantic publishing: the coming revolution in scientific journal publishing, Learned Publishing, April 2009; see also this self-archived version. (Thanks to Gerry McKiernan.)

... STM publishers have already aligned themselves with the aims of semantic publishing, and are seeking ways to implement these commitments in an affordable manner. Lest some subscription-access publishers be anxious about giving away information associated with their published articles, it may be instructive to look at three examples where giving away data and metadata has brought financial benefit. [Note: omitting the examples.] ...

So what data should the publishers be making freely available? Clearly they should provide the datasets that underlie the figures and tables in their articles, and machine-readable provenance information about the article itself. But machine-readable reference lists should also be made available, so that citation networks can be created, analysed and used to promote reader traffic to both citing and cited articles, to the benefit of the publishers concerned. Furthermore, publishers already have extensive sectional mark-up for their articles within the XML created during the publication process, many using a recognized de facto international standard, the National Library of Medicine's XML document format. It would be hugely advantageous if this information was also made available on line, rather than being discarded upon creation of PDF versions of the articles.

Fortunately, with the introduction of the new ACAP [Automated Content Access Protocol] open standard that enables publishers to express terms under which automated access to website content can be regulated, and the increasing employment of Creative Commons licenses regulating rights for reuse, publishers have the means to specify clearly which data are to be made freely available. The open question of who should host the data published to the Web in this manner – whether publishers should each host the datasets relevant to their own publications, or whether there should be independent data repositories, equivalent to SourceForge for open access [sic] software – will be decided in practice, but this is a secondary concern. The important thing is to get the relevant data onto the Web, no matter where they are hosted! ...

See also our past posts on David Shotton.