Opening More Books for Public Access – HathiTrust’s CRMS Project
For the past eight years, HathiTrust’s Copyright Review Management System (CRMS) project has developed the tools, infrastructure, and practices to support the collaborative review of books in HathiTrust to make copyright determinations. Books were targeted for copyright investigation based on the likelihood they could be found to be in the public domain and opened for reading access. The project focused on two pools of books: monographs published in the United States between 1923 and 1963; and monographs published in the UK, Australia, and Canada between 1870 and (roughly) 1950. The US volumes were investigated to discover whether copyright formalities (such as registration and renewal) had been met. The non-US books were investigated to see if the author death dates could be found and verified. As a result of these reviews, over 500,000 copyright determinations were made. Of these, over 300,000 volumes were found to be in public domain and opened for public access.
HathiTrust’s Rights Algorithm
To understand the work of HathiTrust’s CRMS project, it is necessary to know how copyright status is commonly determined for volumes in HathiTrust. The rights status, for the majority of books flowing into HathiTrust, is determined using an automated algorithm run against a volume’s bibliographic metadata to check the publication date and the country where the book was published. Books published in the United States before 1923, or outside the United States before 1870, automatically receive a “public domain” rights status and become open access for anyone in the world to read. Books published outside the United States between 1870 and 1922, automatically receive a “public domain – US” rights status and become open access only for people using an US IP address. Additionally, the algorithm checks to see if a volume is a US government document, as these automatically receive a “public domain” rights status.
HathiTrust’s algorithm is conservative by necessity. To ensure compliance with copyright law, the algorithm errs in the direction of keeping books closed rather than erring by inadvertently opening a book which is still in copyright. Books published in the US between 1923 and 1963 are automatically closed by the algorithm, yet if a book from this era’s copyright was not formally registered and renewed it may legally be in the public domain. The copyright status of books published outside of the U.S. is often determined by the date of the author’s death, but since this information is seldom included in bibliographic metadata, an algorithm cannot make an accurate copyright determination for volumes published after 1870. Books published in the UK, Australia, and Canada after 1870 were targeted for CRMS review because they have similar copyright regimes and could be reviewed easily by trained staff. Copyright law is complex and determining the rights status of books by algorithm alone means that tens of thousands of books which may legally be in the public domain remain closed to the public, even to HathiTrust members.
Institute of Museum and Library Services (IMLS) Grants
The CRMS project was created to alleviate the uncertain copyright status of thousands of books in HathiTrust. It began in 2008 with funding from a National Leadership grant from the Institute of Museum and Library Services (IMLS). After the completion of the first CRMS project, the IMLS awarded it two additional grants (in 2011 and 2014). Over time CRMS evolved into a more and more collaborative effort with up to 20 HathiTrust member institutions and 60 staff members pledging time and engaging in review work.
At the end of February 2016, HathiTrust concluded the grant funded portion of the CRMS project. While this period has ended, HathiTrust staff continue to coordinate copyright reviews, and 10 partner institutions have committed to continue reviewing volumes through the end of 2016.
As a result of HathiTrust’s CRMS project:
- 331,889 copyright determinations were made for US books published between 1923-1963. Of these, 177,398 (or 53.6%) were determined to be in the public domain and are available to HathiTrust users.
- 177,912 copyright determinations were made for books published in the UK, Canada, and Australia. Of these 144,733 (or 79.2%) were determined to be in the public domain and are publicly available.
- Since spring of 2015, 61,514 state government documents in HathiTrust have been identified as potential candidates for review. Of the 13,387 reviewed so far, 9,726 (or 73%) were found to be in the public domain and were opened for public access.
In addition, the CRMS project team has developed a CRMS Toolkit (to be published later this year) that details the methodology developed over eight years of hands-on copyright review experience. The Toolkit will allow the CRMS approach to be replicated and used in a variety of new ways.
UC Participation in CRMS
Librarians and staff from UC Irvine, UC Los Angeles, UC San Francisco, and CDL participated in the CRMS project contributing over 50,000 individual reviews of HathiTrust volumes.
Here are some interesting books discovered by CDL’s CRMS Team while reviewing books for copyright determination:
On the trail of Don Quixote : being a record of rambles in the ancient province of La Mancha by August F. Jaccaci; illustrated by Daniel Vierge.
Dame Wiggins of Lee, and her seven wonderful cats : a humorous tale / written principally by a lady of ninety; edited, with additional verses, by John Ruskin … and with new illustrations by Kate Greenaway … 1885
Journal of researches into the natural history & geology of the countries visited during the voyage around the world of H. M. S. ‘Beagle’ under the command of Captain Fitz Roy, R. N., by Charles Darwin …
The North Sea on the eve of war / by Joseph Conrad. London : Printed for private circulation, 1919. Edition limited to 25 copies.
Additional information about HathiTrust & CRMS Project
CRMS-US (December 2008- November 2011)
Copyright Reviews and Access in HathiTrust (Webcast March 2016)