Skip to main content

WAS to Archive-It Migration Update

was
The Web Archiving Service (WAS) migration to Internet Archive’s Archive-It Service reached two major milestones last week when the WAS crawlers crawled their last crawl and the last of the UC institutions moved their collections to Archive-It. (See previous CDLINFO article for background on the migration.)

I would like to thank all the UC Curators for their hard work and patience in making this migration happen and a HUGE thank you to the Archive-It staff for all their support and expertise. Also thank you to the WAS migration team for their work on the project: Scott Fisher, Mark Reyes, David Moles, Ken Weiss and Marisa Strong.

Migration Summary

In the end, over 140 UC collections and 47.7 terabytes of data were moved!

Largest collections:

  • UCR Library + Water Resources Collections Archive: 9.2 TB/6 collections
  • UCLA: 7.8 TB/35 collections
  • UC Davis : 5.5 TB/10 collections
  • UC San Diego: 5.4 TB/18 Collections

There are a total of 15 accounts in Archive-It for UC institutions: one for each UC Library and an additional account for UCSF, UC Berkeley Institute for Research on Labor and Employment, UC Libraries[1];  and additionally, outside of the library, one each for UCOP communications department and UCB NASA Wavelength project.

Ten non-UC institutions have migrated to Archive-it: Bentley Historical Library, University of Michigan; Emory University Libraries; Mount Holyoke College; New York University Libraries /Tamiment Library (Labor & the Left); Northwestern University Library; Purdue University Libraries; Smith College Libraries; University of Arkansas Library; University of Illinois at Urbana-Champaign Libraries; and USDA Economic Research Service.

The WAS team worked closely with all the institutions and Archive-It to ensure a smooth migration process. All the collections have been moved over and previously crawled data for all institutions, except for five, are now fully integrated into Archive-It search and browse. The remaining content is in the queue at Archive-It and will be available in the coming weeks.

Access

You can search by Organization or Collection Name from the Archive-It home page.

What’s Next?

Our next milestone is that the WAS public interface will be decommissioned Dec. 1, 2015. We will have redirects in place so all content that was public in WAS will be redirected to Archive-It.

The future focus will be on collaborative activities between CDL, UC Libraries, and web archiving community partners to expand collective capacity to steward web archive collections with a focus on tools for researchers and curators, collection development, and discovery and access.  To this end, we are exploring the feasibility of a couple of grant opportunities with partners — more info coming soon.


[1] The UC Libraries collection will include collections that CDL staff created, collections that CDL staff collaborated with UC librarians to create, and the CA.gov collection, created and managed in partnership with government information librarians from UC, the California State Library and Stanford.