Skip to main content

Introducing Zephir, HathiTrust’s New Metadata Management System

The California Digital Library (CDL), in collaboration with staff from the University of Michigan, has developed and implemented a new bibliographic metadata management system, Zephir, for the HathiTrust Digital Library. HathiTrust is a large-scale collaborative repository of digital content from research libraries including content digitized via the Google Books project and Internet Archive digitization initiatives, and content digitized locally by libraries.  The University of California is a founding partner in HathiTrust.   The University of Michigan has managed HathiTrust bibliographic metadata since HathiTrust’s inception until the present, and now, after working closely with University of Michigan staff to transition to Zephir, CDL is taking on this role.

Bibliographic metadata, typically stored as the records that describe resources in a collection and help users find and access those resources, is critically important to HathiTrust.  Most significantly, bibliographic metadata submitted to and processed by Zephir is used to populate the HathiTrust online catalog, inform rights determinations that facilitate access to digitized resources, and trigger the ingest of digitized resources to the HathiTrust repository.

The work of designing and implementing Zephir has highlighted the modularity of the HathiTrust repository and the capacity for distributed development of repository infrastructure in addition to the core systems and services provided by the University of Michigan.  This system will provide flexibility as HathiTrust services evolve, including the potential to coordinate with the HathiTrust Research Center, and to incorporate new technical developments that improve and extend how metadata is updated, used, and shared.

This project is also significant because it is the first core infrastructure hosted by a HathiTrust partner outside of the University of Michigan Library.  It is particularly appropriate that this system was developed by CDL, an established leader in large-scale bibliographic systems, beginning with the original version of the Melvyl Catalog in the early 1980’s.

While the move to Zephir will be transparent to end users, the back end features a new submission process for contributors, providing them with feedback including error reports and histograms detailing MARC tag usage in their records.  The system selects a representative record to describe multiple digital copies of a given title when these exist in the repository, employing a flexible, extensible set of scoring rules to choose the representative records.

The underlying technology for the new system was named for Babar the Elephant’s monkey copain, Zephir, in the French children’s stories by Jean de Brunhoff.  Hathi is the Hindi word for elephant, an animal highly regarded for its memory, wisdom, and strength.

In hosting Zephir, CDL is pleased to support our 10 UC campus libraries and the entire HathiTrust partnership of 89 research libraries.

See HathiTrust’s announcement of Zephir’s release at  and learn much more about how Zephir works at .

By Kathryn Stine, Zephir Project Manager and Metadata Analyst