Open Access News

News from the open access movement


Tuesday, October 31, 2006

Google Custom search and the OA repositories

Steve Hitchcock, Revamped Google service prompts new wave of repository search, Eprints Insiders, October 30, 2006.  Excerpt:

Google's Custom Search Engine has realised immediate results in the area of repository search services. Although not the first such service, and not even the first such service from Google, this one seems to have hit the mark where previous attempts to provide search that could be customised and directed to a specified range of repository sites ultimately proved unsatisfactory.

The sudden proliferation of these services will clearly put more pressure on formal national repository search services such as DAREnet and ARROW, and on OAI search services such as OAIster, Bielefeld Academic Search Engine (BASE) and the new ScientificCommons. All of these would have known from the outset they would be operating in the shadow of Google, but through OAI have some claim to index the 'deep' Web unseen by popular services....

If Google could support a simple quality control "refereed material" tag then according to Les Carr we could get by without OAI and without repositories: "Well, it doesn't" Carr continued "and so OAI still seems our best hope. However, even with five years of OAI our repositories are not doing a very good job of sharing metadata that helps a service to comprehend the status of the holdings that it harvests (is this a published, refereed journal article or equivalent? Is this a paper from an unrefereed workshop? is this a chemical data file?) Too much is still down to interpretation and subsequent data mining of the web pages."

While Carr highlighted the need for improved metadata standards for repositories, other correspondents placed responsibility for improving services with the repositories and with Google. Andy Powell blogged the results of a rough-and-ready test of OpenDOAR search against native Google:

"Overall, what I conclude from this (once again) is that it is not the act of depositing a paper in a repository that is important for open access, but the act of surfacing the paper on the Web - the repository is just a means to en end in that respect....[O]ur 'resource discovery' efforts should centre on exposing the full text of research papers in repositories to search engines like Google and on developing Web-friendly and consistent approaches to creating hypertext links between research papers."

Peter Suber argued that Google will need to do more before OAI becomes redundant:  "Google (and Google Scholar and Google Custom Search) could neutralize some of the remaining advantages of OAI if it would (1) label peer-reviewed articles as peer-reviewed and (2) label OA articles as OA. It could make strides toward the first if it used, instead of discarding, the metadata it found in OA repositories. To make strides toward the second it would have to produce an OA-detecting algorithm that could distinguish an abstract from a full-text article. Authors could help by using machine-readable CC licenses, since the Google advanced search page already has a "usage rights" filter to limit results to CC-licensed content."