I went to another meeting today about UK PubMedCentral . For the first time I began to feel a bit excited about the resource this project is building....
I came away feeling that the intention of the UKPMC project participants is to build a truly excellent, useful and usable resource for health science researchers. I also felt confident that the participants have it within them to do the job.
Sophia Ananiadou from NaCTeM explained the work her group has done using text mining techniques on Medline abstracts....Her aim is to enrich the literature by automatically creating semantic metadata, and thereby to make “undiscovered science” accessible. The MEDIE system is the most vivid example she showed, allowing you to construct a query in the form “subject – verb – object”. For instance, you can ask “what does p53 activate” by searching for subject=p53, verb=activate. Or you can ask “what causes colon cancer” by searching for verb=cause, object=colon cancer. I tried verb=read, object=book but I’m not sure what question that was answering. Currently this MEDIE system is just searching abstracts, but even so it does a pretty good job. It gives a hint of the power of text-mining techniques; I look forward to them being applied to the full-text corpus that is growing in PubMedCentral.
I also enjoyed seeing Peter Stoehr’s demonstration of CiteXplore....[which] is going to be at the heart of the UKPMC search service....One advantage it has over PubMed is the coverage – CiteXplore indexes about half a million extra references covering plant and animal science (from the Agricola database); plus a large collection of biological patents and abstracts of Chinese biological journals....
I was surprised to see that CiteXplore also has citation data. When you display a record it shows the standard bibliographic fields and abstract but it also shows where that article has been cited. And that’s not all: instead of just showing the citing reference it also shows the sentence in which the original article was cited, thus making it easier to interpret the significance of each citation....
Finally, CiteXplore has some features that draw on text-mining tools. When you display results you can ask it to highlight proteins in the results. It will then highlight any occurrences of protein names and turn them into links to UniProt. You can do the same for genes or protein-interactions....
One caveat – all this does presuppose that UKPMC is successful in its aim to gather in the full-text of published research articles. Open Access mandates from the research funders (MRC, CRUK, Wellcome, DoH etc), who are also funding UKPMC, will hopefully help to achieve a high rate of deposition, but it requires the cooperation of biomedical researchers, who have thus far not proved to be very enthusiastic about Open Access. The promise of a better literature search tool may help to persuade them it is worth it.
Posted by
Peter Suber at 2/20/2009 01:14:00 PM.
The open access movement:
Putting peer-reviewed scientific and scholarly literature
on the internet. Making it available free of charge and
free of most copyright and licensing restrictions.
Removing the barriers to serious research.
I recommend the OA tracking project (OATP) as the best way to stay on top of new OA developments. You can read the OATP feed on a blog-like web page or subscribe to it by RSS, email, or Twitter. You can also help build the feed by tagging new developments you encounter.