Over lunch in 2001, Google co-founder Larry Page and Stanford University librarian Michael Keller marveled how much information was still locked inside books in libraries' stacks worldwide and how wonderful it would be if all that information was digital and searchable. Today, more than a million books from over 10,000 publishers and 27 libraries, both in the United States and abroad, are available on Google's Book Search, one of many similar services available online to the general public.
Book digitization has increasingly come into the spotlight as the next step in making information more accessible. Hundreds of libraries worldwide, including all seven of the other Ivies, are involved in some form of digitization of their collections.
Many universities, including Harvard, Cornell, Princeton, Yale and Stanford, are already in the process of digitizing between 500,000 to 1,000,000 of their out-of-print collections. The University of Michigan is one of a handful planning to make their entire collections accessible and searchable online.
Dartmouth's libraries have taken a less conventional approach to digitization.
"[Dartmouth] is very interested in digitizing the unique materials of Dartmouth -- those in Rauner, for example," said Jeffrey Horrell, dean of libraries.
The College has several digitization projects currently in progress, the most ambitious being the scanning of the U.S. Congressional Serial Set, records from the U.S. Senate and House of Representatives dating back to 1817. At no cost to the College, a digitization company in Chester, Vt., is scanning all 12 million pages of the Serial Set as part of a four-year project and will provide the College with a searchable digital copy.
"They are scanning 13,000 pages per day," Horrell said. "This project is a wonderful opportunity for us, and something that has amazing research value."
Also in progress is the digitization of the manuscript of the Encyclopedia Arctica, an unpublished reference work that is part of the Stefansson collection. The encyclopedia is unique in its category and will contribute not only to teaching and research at Dartmouth, but also to scholarship in general, Horrell said.
Dartmouth also proposes to digitize the papers of Samson Occom, one of Dartmouth's founders. Ivy Schweitzer, a professor in the English department, may subsequently use these papers in a future course.
Stanford University's digitization project is on a much larger scale than Dartmouth's.
"We want to digitize every book we have, every map, every unpublished manuscript, " said Andrew Herkovic, director of communications and development at Stanford University Libraries. The university is currently involved in an open-ended project with Google's Book Search to continuously digitize its collection.
Herkovic also said he believes that digitization will increase physical library use. Students will continue to read hard copies of books even if they are available in digital form despite previous predictions that physical library space would eventually become irrelevant.
"Dartmouth is a prime example," Horrell said. "We open at 8 a.m., and the students pour in and fill it up."
Though digitization drastically increases the availability of information, copyright infringement could potentially pose a significant challenge.
"Michigan University, I believe, is sending everything they've got [to Google], without regard to copyright." Herkovic said. "[However], Stanford has not committed to digitizing every book."
Though most books published before 1923 can be considered in the public domain and, hence, legally accessible online in their entirety by the general public, 84 years of published works are still of disputed copyright status.
"It's a very murky business," Herkovic said. Google takes a conservative approach to the copyright problem, assuming that it has the right to scan and index the words on every page of every book, but not to make any book available online for free.
Google's Book Search implements a tiered approach to granting access to copyrighted books, said Google spokeswoman Jennie Johnson. Every page of books whose copyrights have expired is available online and can be downloaded in PDF form. Books uploaded by publishers are partially available by page. Books still protected under copyright law can be searched for keywords, but only the section of the book in the direct vicinity of the search term will be visible.
Johnson said that visitors to the Book Search site are more likely to click on the "buy the book" link if more of the book is visible online.
"It's just like if you were in Borders or Barnes & Nobles," Johnson added. "If you can read more of the book, you can make a better judgement as to whether the book is suitable for your needs."
Horrell said that Dartmouth is not involved in a large-scale digitization project of its collection because the College's small size would mean it would have been a low priority for Google, Microsoft or other digitization initiatives.