The mass digitization processes as practiced at UC libraries are based on the efficient photographing of books, page-by-page, then using optical character recognition (OCR) software to produce searchable text. To ensure digitization is both speedy and economical, human intervention is kept to a minimum. This means that the OCR output is generally used without undergoing additional revision. Only limited structural markup, such as page numbers, tables of contents, and indices, are included.
The digitization processes are designed to have minimal impact on the physical condition of the books. Volumes are usually unavailable to patrons for roughly four weeks during the digitization process.
From the Shelf to Your Screen
- A pick list of unscanned library books is generated.
- The UC library team uses the pick list to gather books from the shelf and pack them on carts.
- The carts full of books are sent to the digitization vendor.
- The vendor scans each book and repacks the cart.
- The vendor processes the digital files to support searching and compiles the scanned images together into a digital volume.
- The carts of books are returned to the library and re-shelved.
- The digital volumes are uploaded to HathiTrust and Google Books where they join millions of other discoverable books.
The Story of the Digital Book
Go behind the scenes to see how the UC’s Mass Digitization team connects researchers with books that would otherwise be out of reach.