Open Access News

News from the open access movement


Saturday, August 26, 2006

The task of digitizing print books

Roger C. Schonfeld and Brian F. Lavoie, Books without Boundaries: A Brief Tour of the System-wide Print Book Collection, Journal of Electronic Publishing, Summer 2006.
Abstract: Print book collections are facing significant transformation in response to mass digitization, remote storage, and preservation. These issues should be considered within a system-wide context in which individual print book collections are viewed not as isolated units, but rather as parts of a larger whole. As libraries look beyond the boundaries of their local print book collections to consider system-wide implications, they will need to be equipped with data and analysis about the system-wide print book collection. This paper provides a brief overview of the system-wide print book collection, defined as the combined print book holdings of libraries everywhere, as reflected in the WorldCat bibliographic database. Issues addressed include the size of the collection; holdings patterns; distribution by publication date and language; and the relationship of the system-wide print book collection to overall book production. The paper concludes with a brief discussion of some implications of the analysis, and possible directions for future research.

PS: This article is more relevant to OA than the abstract might suggest. Here are three bits, with my comments in parentheses and italics.

  • It's hard to count how many distinct books have ever been published but "the closest approximation" is WorldCat which, with a little refining, identifies 24 million. (Google's initial goal to digitize 15 million books, assuming no duplicates, puts it over the halfway mark.)
  • 9.5 million books are only held by one library each. (Incredible. Digitization and OA would be like discovering a lost civilization.)
  • Only 2.4 million books are held by more than 50 libraries each and only 301,000 are held by more than 500 libraries each. (Books are long-tail, both in distribution and demand. We need digitization and OA to overcome the limitations of even the largest print collections.)