Skip to main content

Glossary of Digital Library Terms


Administrative Metadata
Used for managing the digital object and providing more information about its creation and constraints governing its use. See also digital provenance administrative metadata, rights management administrative metadata, source administrative metadata, and technical administrative metadata.
A person or entity authorized by the producer to define users and their roles within an inventory. An administrator also has the rights of a submitter and a consumer.
AIP: Archival Information Package
The internal representation of an object ianto the Digital Preservation Repository, including all data generated upon ingest (e.g., descriptive metadata) needed to manage and preserve it.
Alternate Object Identifier
An optional unique identifier for an object supplied by the producer. Also known as a local identifier.
API: Application Programming Interface
A set of instructions or rules that enable two operating systems or software applications to communicate.
ARK: Archival Resource Key
A naming scheme for persistent access to CDL-hosted digital objects. An ARK is a specially constructed, actionable, and persistent URL encapsulating a globally unique identity that is independent of the current service provider. Each ARK is by definition bound to three things: object access, object metadata, and a faceted commitment statement about providing persistent access. With a single question mark (?) appended, an ARK connects users to the object’s metadata; with a doubled question mark (??), it connects to the provider’s commitment statement. See the ARK web site for more information.
Authorized Signator
The person designated by the producer as having signature authority for contracts and legal agreements. This person signs the submission agreement.


BAM/PFA: Berkeley Art Museum and Pacific Film Archive
The visual arts center of UC Berkeley. Through art and film programs, collections, and research resources, BAM/PFA aspires to be locally connected and globally relevant, engaging audiences from the campus, community, and beyond. See the BAM/PFA web site for more information.
Behaviors Metadata
Metadata used to associate executable behaviors with content in the METS object. A behavior section has an interface definition element that represents an abstract definition of the set of behaviors represented by a particular behavior section. A behavior section also has a behavior mechanism which is a module of executable code that implements and runs the behaviors defined abstractly by the interfact definition.


CDL Guidelines for Digital Objects
A set of guidelines for the creation and manipulation of content files and metadata within CDL repositories. See the Guidelines for Digital Objects.
CGI: Common Gateway Interface
A standard for applications to work in tandem with web servers. In the interface customization tool kit, CGI refers to the application that executes the search and generates the retrieval set from the collection of CDL METS records.
Commitment Statement
A declaration by an organization of its intention to retain and make available a given object or set of objects. This may include such things as the length of time an object identifier will be valid and how invariant the object’s content will be.
Complex Digital Object
Includes two or more content files (and their format variants or derivatives) and corresponding metadata. The content files are related as parts of a whole and are sequenced logically, such as pages. For example, a complex digital object could consist of a multi-page diary scanned as TIFF images, from which are generated display images (JPEGs and GIFs), plus a transcription of the diary and the metadata for each file. See also digital object, simple digital object.
Content file or a metadata package that is part of a digital object.
A person or client system authorized by the producer to view or disseminate objects from the Digital Preservation Repository.
Content File
A file that is either born digitally or produced using various kinds of capture application software. Audio, image, text, and video are the basic kinds of content files. Versions of a content file may be dispersed across several file formats. For example, an image may be scanned into a TIFF file and then JPEG and GIF files may be created from the TIFF file to increase delivery speeds and protect property rights.
The activity of using software to recursively download web documents by following links. There are a variety of crawl methods, including: focused crawl, smart crawl, incremental crawl, targeted crawl, and customized crawl. See also crawler.
Also known as a spider or robot. Software that automatically traverses the web by downloading documents and following links from page to page. See also crawl.
A collaborative reference linking service. See the CrossRef web site for more information.
To take care of, to manage, or to provide access to.
Curation Micro-Services
See Micro-Services
Customized Crawl: Application Programming Interface
A web crawl optimized for a particular web site based on human knowledge of the structure and content of the site.


Dark Archive
An archive that is inaccessible to the public. It is typically used for the preservation of content that is accessible elsewhere. See also dim archive, light archive.
Data Content Standard
Rules for determining and formulating data values within metadata elements. Examples include the Anglo-American Cataloging Rules (AACR), Cataloging Cultural Objects (CCO), Describing Archives: a Content Standard (DACS), and Graphic Materials (GIHC).
DDI: Data Documentation Initiative
An effort to establish an international XML-based standard for the content, presentation, transport, and preservation of documentation for data sets in the social and behavioral sciences. Data archives in the UC system use DDI to preserve collections of materials used in quantitative research. See the DDI web site for more information.
Data Interchange Standard
Used to define the encoding, storage, transmission, and interchange of data values represented within a data structure standard. Examples include the Dublin Core RDF/XML, MODS, and MARC21 formats.
Data Structure Standard
Standards that define metadata elements. Examples of data structure standards include Dublin Core, MODS, and MARC21.
Data Value
A discrete unit of data within a metadata element, i.e., the data encoded within a tag.
Data Value Standard
Data value standards govern the choice and form of controlled forms of data values within metadata elements. These controlled data values are often found in the form of thesauri, vocabulary lists, and authority files. Examples include the Library of Congress’ Subject Cataloging Manual (SCM) and the Art & Architecture Thesaurus (AAT) rules.
Deep Web
Consists of materials that are available by HTTP and are publicly available, but are not included in standard public indexes such as Google. This includes materials that are difficult or impossible to crawl, such as databases.
Descriptive Metadata
Metadata used for the discovery and interpretation of the digital object. Descriptive metadata may be referred to externally or indirectly by pointing from the digital wrapper to a metadata object, a MARC record, or an EAD instance located elsewhere. Or, descriptive metadata may be embedded in the appropriate section of the digital wrapper.
Digital Assets
A collection of computer files that contain intellectual content (images, texts, sounds, video) and/or descriptive metadata of the content and its digital format. They represent an investment for the depositor and an information resource for the researcher.
DLF: Digital Library Federation
A consortium of libraries (including the CDL) and related agencies that are pioneering the use of digital technologies to extend their collections and services. See the DLF web site for more information.
Digital Object
An entity in which one or more content files and their corresponding metadata are united, physically and/or logically, through the use of a digital wrapper. See also complex digital object, simple digital object.
DMP Tool: Data Management Plan Tool
DMP Tool helps researchers create and manage data management plans.
DOI: Digital Object Identifier
A stable identifier (URL). See the DOI web site for more information.
Digital Object Production
The process by which the content file(s) and corresponding metadata are united in the digital wrapper, i.e., MoA II; XML DTD, or METS. The process may be accomplished manually, or it may be automated to increasing degrees using spreadsheets and database applications.
Digital Preservation
The managed activities necessary for ensuring the long-term retention and usability of digital objects.
DPR: Digital Preservation Repository
A set of services that support the long-term retention of digital objects for the benefit of the University of California community. Also known as the UC Libraries Digital Preservation Repository.
DPR Administrator
A Digital Preservation Repository staff member who serves as proxy and performs administrative functions, such as registration and updates.
DPR Designated Community
The University of California libraries that may deposit content in the Digital Preservation Repository.
Digital Provenance Administrative Metadata
Administrative metadata that is the history of migrations, transformations, or translations performed on a digital library object’s content files from their original digital capture or encoding. It should contain information regarding the ultimate origin of the content files.
Digital Wrapper
A structured text file that binds digital object content files and associated metadata together and that specifies the logical relationship of the content files. METS is an emerging, XML-based international standard for wrapping digital library materials. All of the content files and corresponding metadata may be embedded in the digital wrapper and stored with the wrapper. This is physical wrapping or embedding. Or, the content files and metadata may be stored independently of the wrapper and referred-to by file pointers from within the wrapper. This is logical wrapping or referencing. A digital object may partake of both kinds of wrapping.
Dim Archive
An archive that is inaccessible to the public, but that can easily be made accessible if required. It’s typically used for the preservation of content that is accessible elsewhere. See also dark archive, light archive.
DIP: Dissemination Information Package
An external representation of an object exported from the Digital Preservation Repository, optionally including an Archival Information Package, Submission Information Package, and object metadata.
DTD: Document Type Definition
A common way of defining the structure, elements, and attributes that are available for use in a SGML or XML document that complies to the DTD. For example, the (TEI) DTD governs the structure, elements, and attributes of a TEI document.
Drop-Down Menu
A selection field that only displays one choice at first; the list box is hidden until the user expands it by clicking on it with the mouse or some other action. It is not the same thing as a pull-down menu.
Dublin Core
A simple set of metadata elements used as a common meeting ground between richer, more granular metadata standards from diverse groups. Allows for generalizability and the support of cross-collection discovery. See the Dublin Core Metadata Initiative (DCMI) web site for more information.


A discrete component of metadata, or a discrete component of a data structure defined by a DTD or schema (often represented through markup in the form of a tag).
The imitation of a computer system, performed by a combination of hardware and software, that allows programs to run between incompatible systems. Or, the ability of a program or device to imitate another program or device.
EAD: Encoded Archival Description
(Document Type Definition) that assists in the creation of electronic finding aids. Developed at UC Berkeley, it is now maintained as a standard by the Library of Congress and sponsored by the Society of American Archivists. An EAD can be used to represent complete archival structures, including hierarchies and associations. See the Library of Congress EAD glossary for more terms.
EZID (easy-eye-dee) is a service that makes it simple for digital object producers (researchers and others) to obtain and manage long-term identifiers for their digital content.


File Inventory Metadata
A list of all files content files and corresponding (metadata) comprising the digital object.
Finding Aid
A guide or inventory to a collection held in an archive, museum, library, or historical society. It provides a detailed description of a collection, its intellectual organization and, at varying levels of analysis, of individual items in the collection.
Focused Crawl
A web crawl designed to download online documents within specific parameters, such as file type, size, or location. The crawler follows only certain kinds of links and ignores others. Examples: a crawl might focus on HTML and PDF files and ignore sound and video files. Or, a crawl might focus on one domain and not follow any links outside of that domain.
Full-Content Harvest
A full-text harvest that stores parsed segments (up to the full page) extracted from the source item to present search terms in context within a full-text index.
Full-Text Harvest
The harvest of text from target pages to build a full-text index with links to the target resources. This is the same thing Google and other search engines do when performing a search.
FRBR: Functional Requirements for Bibliographic Records
Provides a framework for relating the data that are recorded in bibliographic records to the needs of those records. It uses an entity-relationship model of metadata for information objects, instead of the single flat record concept underlying current cataloging standards. The FRBR model includes four levels of representation: work, expression, manifestation, and item. See the FRBR final report at the International Federation of Library Associations and Institutions web site.


A tool for gathering the raw structural, descriptive, and administrative metadata pertaining to digital materials created by the UC Berkeley Library Systems Office. WebGenDB is eventually expected to support all UC Berkeley digitizing projects. WebGenDB/GenDB has been adjusted better to support MODS and MIX output now that these are emerging as the primary standards for target encodings.


The process by which software can collect metadata packages from remote locations that describe information resources available at those locations. See also metadata harvest, participatory metadata harvest, full-text harvest, and full-content harvest.
Software that performs the harvest function.
The Robert B. Honeyman Collection of Early Californian and Western American Pictorial Material is one of the premier pictorial collections of the Bancroft Library at UC Berkeley. The collection, containing more than 2,300 items, includes original paintings, drawings, prints, sketchbooks, lettersheets, and other pictorial materials, with emphasis on early California and the Gold Rush.
HOPS: Heads of Public Service
A committee of the SOPAG (Systemwide Operations and Planning Group) all campus groups. See the HOPS web page for more information.


Incremental crawl
Designed to update a previous crawl. Evaluates web pages and documents based on previous crawls and downloads only those that have had changes, additions, and deletions.
The process by which a digital object or metadata package is absorbed by a different system than the one that produced it.
Inside CDL
The web site primarily for UC library staff that provides access to the working documents of the CDL.
A set of digital objects to be ingested into the Digital Preservation Repository. The objects will be submitted on behalf of a producer according to the terms of an inventory definition.
Inventory Definition
A document signed by both the producer and Digital Preservation Repository staff that describes an inventory and records the negotiated data model, profile, rights agreements, and transmission method.


JARDA: Japanese American Relocation Digital Archive
A digital “thematic collection” within the OAC documenting the experience of Japanese Americans in World War II internment camps. The JARDA web site includes a broad range of digital objects, including photographs, documents, manuscripts, paintings, drawings, letters, and oral histories. These materials are described and inventoried in 28 different finding aids. Access to the digital content is also provided through Melvyl®, UC’s online union catalog.
JHOVE2: JSTOR/Harvard Object Validation Environment
An open source software tool for validating digital object formats and to generate technical metadata. See the JHOVE web site.


Light Archive
An archive that is accessible to the public. See also dim archive and dark archive.
A URL that references resources integral to the digital object. In some instances, these references may be to internal parts of the object (e.g., another sub part of the overall digital object). In other instances, these references may be to resources that exist outside and independent of the digital object but that are, nevertheless, an important part of the digital object’s content.
Link Resolver
Software that brings together information about the cited resource, the user, and the library’s many subscriptions, policies, and services. For the software to work, the content providers must be willing to participate as sources (databases or sites that can provide a link from a reference). The link resolver becomes activated when the user clicks on a link or button (“Search for full text”) embedded in the user interface of PubMed (or other services). Using the OpenURL framework, information is bundled together from the source and sent to the resolver software that will process the data and compare it to the Knoweldgebase. The user is then presented with a range of options for locating the article, such as a link to the online article or journal, a listing for the library’s print holding for that title, interlibrary loan, or document delivery options.
Lot Identifier
A Digital Preservation Repository identifier for a set of digital objects that were submitted during a specific time period.


MARC21: MAchine-Readable Cataloging
Data structure and interchange standard for the representation and communication of bibliographic and related information in machine-readable form. The MARC21 format is maintained by the Library of Congress’ Network Development and MARC Standards Office. See the MARC web site for more information.
MoA II: Making of America II
A DLF project to create a digital library object standard by encoding defined descriptive, administrative, and structural metadata, along with the primary content, inside a digital library object. The cornerstone of the MoA II effort is an XML DTD that defines the digital object’s elements and encoding; this MoA II DTD is the direct predecessor to METS. See the MoA II report for more information.
Merritt is a cost-effective repository service that lets the UC community manage, archive, and share its valuable digital content. Use Merritt to provide long-term preservation of digital assets, share your research with others or meet the data sharing and preservation requirements of a grant-funded project.
Structured information about an object, a collection of objects, or a constituent part of an object such as an individual content file. Digital objects that do not have sufficient metadata or become irrevocably separated from their metadata are at greater risk of being lost or destroyed. Ephemeral, highly transient digital objects will often not require more than descriptive metadata. However, digital objects that are intended to endure for long periods of time require metadata that will support long-term preservation. See also administrative metadata, behaviors metadata, descriptive metadata, file inventory metadata, and structural metadata.
METS: Metadata Encoding and Transmission Standard
A standard for encoding descriptive, administrative, and structural metadata about objects within a digital library, expressed using XML. METS is the emerging national standard for wrapping digital library materials. It is being developed by the Digital Library Federation (DLF) and is maintained by the Library of Congress. See the METS web site for more information.
Metadata Harvest
The harvest of existing metadata records from resource repositories, such as through OAI, to gather metadata for query results or index creation.
A granular set of small, independent, but highly interoperable services that form the core infrastructure of services such as Merritt and EZID. See our wiki for more information.
MODS: Metadata Object Description Schema
A XML schema, and a data structure and interchange standard, used for the creation of original resource description records (and may also be used as an alternative method for representing MARC data). MODS was developed by the Library of Congress’ Network Development and MARC Standards Office. See the MODS web site for more information.
The act of searching more than one database simultaneously through the use of metasearch software. Also called “cross-database searching” or “federated searching.”
The transfer of digital objects from one hardware or software configuration to another, or from one generation of computer technology to a subsequent generation. The purpose of migration is to preserve the integrity of digital objects; and to retain the ability for clients to retrieve, display, and use them in the face of constantly changing technology. Migration includes refreshing as a means of digital preservation, however, it is not always possible to make an exact digital copy of a database or other information object and still maintain the compatibility of the object with a new generation of technology.
The process of making exact replicas of resource items, such as web pages, with slight modifications to hyperlinks as needed to reproduce the behavior of the items. This is similar to using the “save as” function from a browser to save a local copy of the page, including its contents and images.
MOAC: Museums in the Online Archive of California
California museums working with libraries and archives to increase and enhance access to cultural collections. See the MOAC web site for more information.


NSDL: National Science Digital Library
A U.S. government-sponsored digital library of exemplary resource collections and services, organized in support of science education at all levels. See the NSDL web site for more information.


OAC: Online Archive of California
A single, searchable database of finding aids to primary sources and their digital facsimiles held in libraries, museums, archives, and other institutions across California. Primary sources include letters, diaries, manuscripts, legal and financial records, photographs and other pictorial items, maps, architectural and engineering records, artwork, scientific logbooks, electronic records, sound recordings, oral histories artifacts, and ephemera.
OAI: Open Archival Information
Develops and promotes a low-barrier interoperability framework and associated standards for the dissemination of content. Originally, it was designed to enhance access to e-print archives, but it now takes into account access to other digital materials. The essence of the open archives approach is to enable access to web-accessible material through interoperable repositories for metadata sharing, publishing, and archiving. See also OAI-PMH.
OAIS: Open Archival Information System
A conceptual framework for an archival system dedicated to preserving and maintaining access to digital information over the long term. See the OAIS reference model.
OAI-PMH: Open Archives Initiative-Protocol for Metadata Harvesting
A protocol defined by the Open Archives Initiative. It provides a method for content providers to make records for their items available for harvesting by service providers, such as centralized search services. See the OAIster web site for more information.
Object Identifier
The primary identifier for a digital object within the Digital Preservation Repository, usually an ARK.
An academic unit established by UC to provide a supportive infrastructure for interdisciplinary research complementary to the academic goals of departments of instruction and research.
ORU: Organized Research Unit
An academic unit established by UC to provide a supportive infrastructure for interdisciplinary research complementary to the academic goals of departments of instruction and research.


Participatory Metadata Harvest
The harvest of implicit metadata, text, and format information from items to create metadata. For example, during a web crawl, a web page could be fed into an automated metadata harvest engine, such as PhraseRate, to create a title, author, description, and keywords based on document formatting and key phrase repetitions.
A web site or service that provides access to online resources, such as digital objects.
A one-time process of information gathering and negotiation between the producer and Digital Preservation Repository staff regarding the possible ingest of a set of objects. This process usually culminates in the signing of a submission agreement and an inventory definition.
Pre-Submission Worksheet
A form filled out by the producer during pre-submission that provides information for Digital Preservation Program staff detailing licensing (rights information) and specifying the number of files, formats, metadata information, and delivery type.
An organization with legal, financial, and curatorial control over one or more object inventories to be submitted to the Digital Preservation Repository.
Producer Technical Contact
A person acting on behalf of the producer who manages the process (including the technical details) of submitting objects to the Digital Preservation Repository.
Pull-Down Menu
A menu that expands downward when its title is selected with the mouse. A list of options appears as long as the mouse button is held down, and the user can select an option by scrolling through the menu and releasing the mouse button when the desired option is highlighted (as defined by the ComputerUser High-Tech dictionary). A pull-down menu is different than a drop-down menu.


Rights Management Administrative Metadata
Administrative metadata that indicates the copyrights, user restrictions, and license agreements that might constrain the end-use of the content files.


A common way of defining the structure, elements, and attributes that are available for use in a XML document that complies to the schema.
Security Backup
A second copy of a set of digital assets made to protect against loss due to unintended destruction or corruption of the primary set of digital assets. Security backups are created routinely and are not to be considered archives.
The link server from Ex Libris that allows context-sensitive linking between web resources in the scholarly information environment. SFX accepts an OpenURL as input from an information resource, which is referred to as an SFX source. See the SFX web page for more information.
Simple Digital Object
Comprised of a single content file (and its format variants or derivatives) and the metadata for that file. For example, a TIFF of the Mona Lisa, a user JPEG, a reference GIF, and the appropriate metadata would comprise a simple digital object. See also digital object, complex digital object.
SPIRO: Slide and Photograph Image Retrieval Online
The visual online public access catalog to the 35mm slide collection of the Architecture Visual Resources Library at UC Berkeley. The collection includes more than 250,000 slides and 20,000 photographs. It was named in honor of the late architectural historian Professor Emeritus Spiro Kostof.
Smart Crawl
A focused crawl based on dynamic criteria. For example, a crawler could be programmed to analyze and evaluate a web site for volatility, the presence of metadata, or the structure and content of a site, etc. The more it crawls, the smarter it gets about what to crawl and what not to crawl.
Source Administrative Metadata
Administrative metadata for describing the source from which the digital content files were produced. Sometimes this will be the original material; other times it will be an intermediary such as a photographic slide, or another digital content file.
Standard Access
A general access path provided by the CDL and the OAC, namely the OAC database. Customized access or portal to the depositor’s digital assets is the responsibility of the depositor and not the CDL or the OAC.
Structural Metadata
Metadata used to indicate the logical or physical relationship of the content files comprising the complex digital object, e.g., the sequence of pages for a group of images of a diary or of detailed images of a larger image. The structural metadata specifies a coherent presentation of the digital content and its pertinent associated metadata.
The act of transmitting a prepared digital object for deposit into the Digital Preservation Repository. Objects are prepared in accordance with the submission agreement and the CDL Guidelines for Digital Objects.
A person or client system authorized by the producer to submit objects to the Digital Preservation Repository. A submitter also has the rights of a consumer.
Submission Agreement
A legal document through which the producer grants the Digital Preservation Repository the right to electronically store, convert, and copy digital assets for preservation purposes.
SIP: Submission Information Package
An external object representation prepared by the producer for the purpose of ingest into the Digital Preservation Repository, where it will be converted automatically to an Archival Information Package.
Includes materials that are publicly available by HTTP, are easily discoverable by crawlers, and are indexed by public indexes such as Google. Sometimes referred to as the static web. The opposite of the deep web.
SOPAG: Systemwide Operations and Planning Advisory Group
A University of California systemwide library planning group. See the SOPAG web site for more information.


A short, formal name used to indicate data structure or metadata elements, such as (title) in HTML or (unittitle) in EAD.
Targeted Crawl
A web limited to particular web sites based on desired content (compare to a focused crawl. A targeted crawl may or may not be customized.
Technical Administrative Metadata
Administrative metadata that describes the technical attributes of the digital file.
TEI: Text Encoding Initiative
An initiative that publishes Document Type Definitions catering to a wide range of academic electronic text projects. Books, manuscripts, collections of poetry, and other kinds of literary and linguistic texts for online research and teaching that are available electronically are encoded in TEI. See the TEI web site for more information.


A login identity used to authenticate a person or client system as a submitter consumer, or administrator for an inventory.


A process to check one or more aspects of a submission for schema errors, file format problems, and ingest parameter inconsistencies that might affect its suitability for preservation. Results of a validation may include any combination of structural analysis information, warning messages, or fatal errors that prevent an object from being ingested.


Web Analyzer
A tool that gathers web metrics and background information about a particular web site to inform administrative, technical, and selection decisions about the capture, curation, and preservation of the digital entities. For example, an analyzer might provide information about the diversity of file formats, the size of the files, an idea about the content, and a comparison to content already captured. With this information, the potential costs, value of the content, and preservation strategy could be determined.
WAS: Web Archiving Service
The Web Archiving Service supported the capture, analysis, archiving, and publication of web sites and documents. WAS has been discontinued; its collections and all core infrastructure activities, i.e., crawling, indexing, search, display, and storage, have been transferred to Internet Archive’s Archive-It.
Web Crawler
See crawler.


XML Gateway: eXtensible Markup Language Gateway
A service that responds to requests (e.g., search requests) with XML-encoded data streams. Queries to the CDL METS Repository are returned as XML data. That XML response is typically transformed into HTML for viewing in a browser by an XSLT.
XSLT: eXtensible Stylesheet Language Transformations
Can be used to transform an XML document into another form such as PDF, HTML, or even Braille. XSLT stylesheets work as a series of templates that produce the desired formatting effect each time a given element is encountered. One of the most common uses of XSLT is to apply presentational markup to a document based on rules relating to the structural markup. For example, each time a “title” appears in the structural markup, the text within the element could be put into italics. XSLT can also control the order in which elements and attributes are displayed. This means that tables of contents or indexes can be generated automatically on the basis of the content of a document.