Skip to main content

Mass Digitization Projects Update: 2009 in Review

By Heather Christenson, CDL Mass Digitization Project Manager

The UC Libraries’ made great progress in mass digitization of books in 2009, despite significant fiscal challenges. In April 2009 we reached a major milestone: over 2 million books digitized!

In August 2009 we completed scanning over 100,000 English language, public domain (publication year pre-1923) books housed at the Southern Regional Library Facility (SRLF).   This marked the successful completion of a joint effort with the Internet Archive (previously funded by Microsoft) that resulted in approximately 200,000 new digital books.  Through the efforts of UC Davis, we also completed the digitization of two significant sets of California documents via Internet Archive, the Bulletin of the California Division of Mines and Geology and the Bulletin of the California Department of Water Resources.  In October 2009, the Biodiversity Heritage Library announced the incorporation of selected UC public domain books digitized by Internet Archive into its search and access services.

Google projects at UC Santa Cruz, UC San Diego, and the Northern Regional Library Facility (NRLF) have resulted in the digitization of a wide variety of UC collections.  UCSD has focused on the International Relations and Pacific Studies collection, Scripps Institution of Oceanography Library, and the East Asia collection.  UCSC continues to digitize collections from both the McHenry Library and the Science & Engineering Library. We’ve worked our way through over 40% of NRLF, a remarkable achievement.  We finished the year with planning meetings for UCLA as our next Google project location.

As anyone who reads the news knows, the landscape surrounding book digitization is complex and dynamic.   CDL created a “UC and the Google Book Settlement” FAQ (http://osc-s10.cdlib.org/google/) in response to faculty discussion surrounding the proposed Google settlement with authors and publishers, and we continue to monitor the implications of the Settlement as it affects our commitment to digitize UC books.  With an eye towards moving more of our “orphan works” into the public domain, CDL, along with UCLA, participated in a pilot test of OCLC’s Copyright Evidence Registry.  Although the CER remains a beta project, CDL continues to explore avenues to make fully available as many UC library books as possible.

In concert with coordinating the digitization of thousands of books daily, the CDL Mass Digitization Team embraced the challenge of stewardship of UC’s digital books by leading activities that are key to our participation in the HathiTrust.   Working with our HathiTrust partners at the University of Michigan, CDL accomplished the transfer of over 1.2 million Google-digitized UC volumes (and counting!) into the HathiTrust Digital Library for preservation, discovery and access.  Due to CDL efforts in 2009 to develop standards and ingest flows for Internet Archive content, our UC Internet Archive-digitized books will go into the HathiTrust in the coming months.

The CDL Mass Digitization Team has also played a leadership role amongst the Google library partners in setting standards and advocacy for high quality output of digital volumes from Google.  By spearheading discussion on key technical issues such as quality metrics, metadata, de-duplication, and error rates, we have created a richer dialogue with Google that will serve UC and all the Google partners.

I am pleased to report that amongst all of our projects collectively UC has digitized close to 2.5 million volumes.   Nearly 450,000 of those volumes are in the public domain and available to all via Google Book Search, the Internet Archive, and soon, HathiTrust.

Special thanks and congratulations go to the project teams on all of the participating UC campuses – the real stars behind this grand experiment that is mass digitization: Scott Miller (NRLF) Jutta Wiemhoff (NRLF), Shondell Beck (NRLF), David Zuckerman (UCB), Colleen Carlton (SRLF), Matt Smith (SRLF), Martha Hruska (UCSD), David Jahn (UCSD), Ryan Finnerty (UCSD), Roger Smith (UCSD), Sue Chesley Perry (UCSC), Maric Kramer (UCSC), David Meyer (UCSC), Karen Andrews (UCD), and the many others who have contributed.

More information about our UC Mass Digitization projects, including the UC Mass Digitization FAQ and our new Where to Find Our Books page, is available at: http://www.cdlib.org/services/collections/massdig/.