Open Access News

News from the open access movement


Sunday, October 23, 2005

Repeat: Google is indexing books, not reprinting them

Danny Sullivan, Indexing Versus Caching & How Google Print Doesn't Reprint, Search Engine Watch, October 21, 2005. Excerpt:
I've written before that legal concerns about book indexing and Google Print may have repercussions for web indexing. Kevin Werback and David Winer look at this again, afresh. A look at this, plus the crucial difference between indexing (making something searchable) and caching (reprinting content). Google's library scanning program makes things searchable in Google Print but [does not reprint them]....When any search engine visits a web page, it effectively makes a copy of that page which is stored in the index. But the index literally breaks apart the page. It stores where words were located, were they in bold, what other words were they near, were the words in a hyperlink and so on. Nothing in the index is anything you as a human being could read....The ability to opt-out of the index is another reason why we really haven't had a major search engine sued over web search indexing. In addition, site owners as Dave notes generally want to be indexed, so they can get traffic. In fact, the reason so many are upset over the current indexing update at Google is that they feel changes are causing them to lose traffic. But whether it is LEGAL to do this type of indexing (as opposed to caching) still really hasn't been tested....Here's the thing. Google is NOT, repeat NOT, republishing copies of books that it scans out of libraries. This is a fundamental mistake that many people seem to be making. Google is scanning books into an index, just as it spiders web pages and adds them to its index. It is making the books searchable by doing this, but that process does not republish the books in a way you can read. Think about it in web search terms. You can find a matching book, but there's NO hyperlink to click on that will take you to an online version of the book itself. There's just a snippet -- maybe -- of the text surrounding the words matching what you looked for. Want the actual book? Google Print won't give it to you. Instead, you have to go someplace and buy it or find it in a library. Google Print merely tells you the book may be what you're looking for. The only exception to this is if a publisher OPTS-IN. Not opt-out. If a publisher chooses, then -- and only then for books that are in copyright -- will Google display some of the actual book. The exact amount is left up to the publisher....[B]ook search is actually more opt-in than web search is. Books themselves aren't cached or shown. But they are made searchable without permission....Postscript: Ray Gordon writes to say he has filed a complaint arguing that web search on an opt-out basis is in violation of copyright. You can read the filings here. I've skimmed them, and he seems more concerned about usenet material (rather than web material) that can't be removed, apparently because others may have reprinted his own posts.