Open Access News

News from the open access movement


Monday, April 07, 2003

The High Energy Physics Structure Project has a working online demo of an automated indexing and classification system for online research papers. Using the 50 top-cited papers from SLAC SPIRES, the project clustered the papers by theme, producing different clusters at different levels of generality. "The clustering was performed with an algorithm analysing the structure of the citation graph. The clusters obtained form the themes of the first level (the papers themselves form the themes of the zero level). Then the same clustering algorithm was applied to the set of themes of the first level, etc. This turned out to be a process convergent fast: at fifth level all the themes collapsed into a single theme, which is naturally called 'High Energy and Nuclear Physics'." (PS: This is a good example of the kind of sophisticated software that takes open-access articles as data for machines rather than as ends in themselves for human readers. It will make the papers more visible and useful than they already are. While it could be run on commercial databases, it will always be cheaper to run on open-access collections. This is why open-access collections stimulate the progress of indexing and classification software, and why the progress of this software stimulates authors and journals to provide open access to their articles. At the same time, of course, it answers the objection that open-access papers are not indexed in the traditional ways.)