Skip to main content

Digital Collection Development Projects

Web Archiving Projects

In addition to supporting mass digitization projects, we help build digital content collections by collaborating with partners within UC and the broader web archiving community on numerous web archiving projects to expand collective capacity to steward web archive collections. CDL develops tools for researchers and curators, collection development, and discovery and access, including Cobweb, an open source platform that supports collaborative collection development for web archives

Active Mass Digitization Projects

The Mass Digitization Team is currently working with a broad range of partners to support the following projects:

Google Library Project

UC Libraries joined the Google Books Library Project in August 2006 and began sending books to Google to be digitized that same year. Since then UC Libraries have continued to send tens of thousands of books to Google each year for digitization. As of 2020, nine of the ten UC campuses and both Regional Library Facilities have been active partners with local project teams who participate in the project by sending books to Google for digitization. To date, UC has contributed over 4 million books to be digitized via the Google Books Library Project.

At the start of the project, Google digitized volumes from all languages and time periods regardless of copyright status. Soon after, they began to emphasize digitizing material that could be made more open to the public, such as public domain books and government publications.

UC deposits its copy of each volume digitized by Google into the HathiTrust Digital Library. The full text of the volumes are searchable in Google Books and HathiTrust. Volumes determined not be in copyright are made “full view”, i.e available for reading access.

CDL’s Mass Digitization Team provides project coordination, communications, advocacy, and support for project teams on UC campuses and at Regional Library Facilities.

Additional information: UC’s contract with Google

HathiTrust Digital Library

The UC Libraries are a founding partner of the HathiTrust Digital Library. HathiTrust is a collaborative partnership and digital library founded in 2008 by the research libraries of the Committee on Institutional Cooperation (CIC) and the University of California. HathiTrust formed from the desire to create a secure and enduring academic home for mass digitized research library collections resulting from partnerships with organizations such as the Open Content Alliance, Google, and Internet Archive.

The Mass Digitization Team works closely with HathiTrust and UC campus teams to strengthen communications, enhance the relationship, and manage digitized content submission from campuses to the HathiTrust repository. Learn more about HathiTrust and UC’s involvement in the HathiTrust section.

UC Library Reprints

Roughly 200,000 of the UC Libraries’ digitized volumes are available as soft-bound reprints for purchase on Amazon. The UC Reprints are in the public domain (free from copyright restrictions) and are also available for full view access on HathiTrust.

FedDocArc

In September 2014, the UC Libraries approved the creation of a shared UC archive of US federal government documents. The FedDocArc project’s mission is to create a persistent archive containing one print and one digital copy of all US federal government documents owned by the UC Libraries. Print copies are shelved at a UC Regional Library Facility or a UC campus library; digital copies are preserved in the HathiTrust repository.

CDL’s Mass Digitization team helps facilitate Google’s digitization of the federal government documents, and supports the contributing libraries with the deposit of digitized versions into HathiTrust.

Local Campus Digitization

Many UC libraries have active digitization programs on their campus. When desired, volumes digitized locally by campuses may be deposited into HathiTrust and/or Google Books. Additionally, UC Libraries may choose to update or correct content in digitized books by digitizing missing or erroneous pages and inserting them into existing volumes on HathiTrust and Google Books. CDL’s Mass Digitization team provides standards, guidance and consultation to assist UC libraries in depositing locally digitized materials into HathiTrust and Google Books.

Non-book Digitization Pilot Projects

CDL’s Mass Digitization and Digital Special Collections teams are collaborating with UC campuses to develop an efficient, cost effective process to digitize non-book materials from their collections at scale. The processes are being tested and improved iteratively through a series of pilot projects. The goal of the pilots is to create high throughput digitization workflows that may be copied and repeated across UC campuses.

Previous Projects

Open Content Alliance and Internet Archive

In 2005, the UC libraries were a founding member of the Open Content Alliance (OCA). The OCA was a consortium of research libraries, non-profits, and tech companies that came together to create a publicly accessible archive of digitized texts. Internet Archive was the digitization vendor for the project. The OCA focused on digitizing monographs and serials that were either out of copyright or where the permission of copyright holders had been obtained.

The UC libraries were one of the first members of OCA to provide books for digitization. UC hosted two Internet Archive digitization centers: one at NRLF (from 2006 to 2008); and one at SRLF (from 2006 to 2009). In all, close to 200,000 UC library volumes were digitized by Internet Archive and made freely available to the public. UC’s Internet Archive digitized volumes are open access in HathiTrust and Internet Archive.

Additional information: UC’s contract with Internet Archive