Skip to main content

Mass Digitization Projects Update

By Heather Christenson, CDL Mass Digitization Project Manager

2008 was a busy year for our UC Libraries’ book digitization activities.  We continue digitizing tens of thousands of books from our print collections from many libraries across UC.  In the latter half of 2008, our mass digitization projects have responded to significant changes and developments in the scholarly and commercial world: Microsoft decided to end its Live Search Books program which funded a portion of UC book digitization, Google announced a Settlement with authors and publishers, and UC allied with the University of Michigan, Indiana University, and the Committee on Institutional Cooperation, or CIC (a consortium of Big Ten Plus universities in the Midwest) to anchor the new HathiTrust digital repository.  Throughout, we have continued to steadily digitize books in partnership with the Internet Archive and Google.

Partnership Developments
Microsoft ceased funding the Internet Archive scanning projects in June 2008.  Close to 150,000 public domain UC books were scanned with Microsoft funding under the auspices of the Live Search Books project.  Although Microsoft took down the Live Search Books website, all books digitized via the Microsoft project continue to be available in full text via Internet Archive and Open Library.  As a result of the loss of Microsoft funding for UC books scanning, we ceased scanning books from NRLF with Internet Archive and are now concentrating our efforts at NRLF on the Google project.

In October 2008, a settlement was signed between Google and a group of organizations representing publishers and authors alleging copyright infringement.  More information about the settlement is available on InsideCDL and the CDL web site.

Also in October 2008, the UC Libraries joined the HathiTrust digital repository.  From the HathiTrust web site:

“Launched jointly by the 12-university consortium known as the Committee on Institutional Cooperation (CIC) and the 11 university libraries of the University of California system, the HathiTrust leverages the time-honored commitment to preservation and access to information that university libraries have valued for centuries.  UC’s participation will be coordinated by the California Digital Library (CDL), which brings its deep and innovative experience in digital curation and online scholarship to the HathiTrust.”

Read more about the HathiTrust.

In addition to coordinating mass digitization activities across the UC campuses and digitization partners, in July 2008 CDL put the Mass Digitization Inventory Database (MDID) into production and conducted training for campus representatives.  MDID gives us an aggregate picture of the volume of our two major projects, and holds promise for future uses of the data.

We are beginning to actively engage with HathiTrust, including learning about the Hathitrust technical environment and planning for ingest of UC mass digitized books into the repository.

At the request of the University Librarians, a small working group with campus representation is currently being launched to investigate print-on-demand options for UC mass digitized books that are in the public domain.  More information will be reported on at the conclusion of this project, currently targeted for early summer.

On campus
NRLF, UCSD and UCSC continue their significant effort and contributions to the Google project.  At UCSD, we have been working in the International Relations and Pacific Studies Library, the East Asian Library, and have recently begun working in the Scripps Institution of Oceanography Library.  UCSC is digitizing items from the McHenry Library.  At NRLF, we continue to work through the shelves at a truly impressive rate.

In July of 2008, the Internet Archive scanning center at NRLF moved to San Francisco, where Internet Archive continues to run it.  In December 2008, CDL presented NRLF with an award for their long-running efforts and for making history as the first Internet Archive/Open Content Alliance scanning location in the United States.

In October 2008, we launched a project with UC Davis to scan California Dept. of Water Resources Bulletins and Bureau of Mines Bulletins.  UC Davis volumes were shipped to the Internet Archive scanning center in San Francisco, and work is nearly complete.  This effort offered both CDL and UC Davis an opportunity to learn how to organize and execute a smaller-scale mass digitization project with a new campus partner on a short timeframe.  More information about the project can be found in the related CDLINFO posting.

The Internet Archive scanning center at SRLF continues to operate at an impressive rate, and SRLF staff’s patience and efficiency have been central to this effort.  All of the English language pre-1923 books originally targeted for digitization at SRLF – close to 93,000 volumes – have recently been completed.

I am pleased to report that amongst all of our projects collectively UC has digitized over 1.8 million volumes.

For more information about our UC Mass Digitization projects, and about where to get access to our digitized books, please see the recently updated information and FAQ on InsideCDL.