Open Access News

News from the open access movement


Wednesday, May 10, 2006

More on machine-readable scholarship

T. Scott Plutchak, Open Computation, T. Scott, May 9, 2006. Excerpt:

Coincidentally, a draft of an article by Cliff Lynch that started bouncing around the net yesterday points to some very intriguing ways of dealing with the problems so congently identified by Mark D in his comments on my post yesterday.   Lynch suggests that we are approaching the day when we can apply data mining and analysis techniques to large sets of articles in such a way that we can begin to automate the kinds of meta-analyses and knowledge discovery that are now so terribly labor-intensive....[W]e are overwhelmed with articles and reports and seem to have less and less time to actually make sense of them.   Now we may have an opportunity to move past the focus on individual articles to developing systems that can do the kind of synthesis that we really need.

Lynch's article also addresses one of the questions that I've heard from publishers regarding the NIH public access program -- namely, why does NIH want to have all of these articles in a single repository?  Why isn't it sufficient to establish effective linkages to the publishers' sites?  Who cares where the articles actually reside?  I started to get a glimmer of the answer to this during Elias Zerhouni's talk at the AAMC meeting last fall.  If access to individual articles were the only issue, then linking is all that we need.  But Zerhouni is after something more than that.  In an article published earlier this spring in Health Affairs, he points out that "we have no place where the integration of information can be used as a powerful hypothesis generator as well as a powerful way of understanding change."  Lynch maintains that in order to start to develop systems that can do this kind of integration, centralized research databanks are more efficient than distributed ones....Lynch suggests that one of the benefits of more open access is that it will be easier to create such repositories, but that's clearly not essential, if publishers are willing to see the benefits of such repositories and rethink how they manage the control of the content that they own....Developing the kinds of open computation tools that Lynch envisions will have a far greater impact on the development of real knowledge than the elimination of subscription barriers to individual articles ever could.

Comment. Just a quick point on the final sentence. If T. Scott is saying that the importance of machine processing diminishes the importance of OA, then he may be overlooking one of the key points from Lynch's article: "Traditional open access is, in my view, a probable (but not certain) prerequisite for the emergence of fully developed large-scale computational approaches to the scholarly literature." But if he's saying that OA is important not just for direct human reading but even more for machine processing and indirect human reading, then I fully agree. Here's how I put this point in September 2002:

As we move further into an era in which serious research is mediated by sophisticated software, commercial publishers will have to put their works into the public Internet in order to make them visible to serious researchers. In this sense, the true promise of [open access] is not that scientific and scholarly texts will be free and online for reading, copying, printing, and so on, but that they will be available as free online data for software that acts as the antennae, prosthetic eyeballs, research assistants, and personal librarians of all serious researchers.