Attendees: Marcus Banks (UCSF); Ira Bray (CSL); Mitchell Brown (UCI); Melissa Browne (UCD); Heather Christenson (CDL); Penny Coppernoll-Blach (UCSD); Sara Davidson (UCM); Jayne Dickson (CDL); Laine Farley (CDL); Isom Harrison (LLNL); Rachael Hu (CDL); Charleen Kubota (UCB); Rosalie Lack (CDL); Lorna Lueck (UCSB); Patricia Martin (CDL); Ellen Meltzer (CDL); Catherine Mitchell (CDL); Rebecca Morin (CAS); Jose Olivares (LBNL); Michael Oppenheim (UCLA); Leah Prescott (Getty); Tracy Seneca (CDL); Kris Veldheer (GTU); and Sherry Willhite (CDL).
Welcome – Introduction of new members; role of Users Council – Ellen Meltzer
See Users Council Recap of Activities May 8, 2007 – April 30, 2008 handout for the list of new members and the recap of activities.
Role of Users Council: For being a communication line between your institution and the CDL, for getting input…it’s very valuable to get input from the front line librarians. Users Council commented on many and various topics this year, such as Google books. We ceased Telnet Melvyl, dealt with a number of database issues such as the big EBSCO change (moving from Expanded Academic ASAP), worked on getting some East Asian databases up and running, changes in digital special collections and the e-scholarship repository, an exciting new change in Request, improvements in UC-eLinks, etc. Once again, thank you for all your work; we can’t tell you how important you are to us.
The Inside Scoop from CDL – Laine Farley
See Laine Farley’s PowerPoint presentation for details.
Broad view…a full day of interesting details about these projects. What’s happening at the Office of the President. It’s important that everyone realizes where we are right now. It’s like working in a construction zone. Some things are being torn down and rebuilt…confusion abounds. Some of the infrastructure on which we depend is being rebuilt.
In some ways, CDL is in good shape. We did a reorganization in 2006 so we’ve been through this already. We will be taking a 12 % budget cut, which is less than OP in general. We’ve been able to fill a number of vacancies – 13 recruitments are in progress. We’re moving forward with this. We’ve still been able to do all this work in spite of these vacancies.
Introductions to new CDL staff, Stephen Abrams, Nancy Scott-Noennig, Stephanie Collette, and Adam Brin. Rachel Hu (who was present at the meeting) is our User Experience designer, Debra Bartling, Mike McKenna and Emily Stambaugh….new people with really great backgrounds.
We’re revitalizing our internship program…interns from UCB, San Jose State and UCLA iSchools.
The new University Librarian…can’t give the scoop on this because the decision has not yet been made.
Within the next month or two, we’ll be publishing a CDL Profile commemorating our 10 year anniversary and the 25th anniversary of Melvyl. We like to think we listen to users; we put a lot of effort in user needs and we try to bring value to their experience. We’re trying to move our services into the flow of where people do their work.
Systemwide Library Planning has been collecting statistics for many years, but these statistics haven’t always accurately reflected the online environment. SOPAG recently was tasked with determining the types of statistics that make sense for the online environment. The statistics will be used for the following areas: Strategic Communication, Planning, Systemwide Administrative (required), and Collaborative Operational Management.
Measuring Impact: Indicators (combination of metrics)…what does it all mean? Anecdotal evidence (can capture the real change factor, personal, specific and memorable).
Digital Library Collaboration Workshop: Which of these activities (create, access, manage, and preserve) do campuses do and what does CDL do. It became very clear that campuses expect CDL to address long term preservation issues. Other ideas from the workshop: UC Digital Library Collection, Inventories of projects and Registries of tools, Pilot projects and Centers of Excellence. SOPAG is discussing outcomes and next steps.
Big Issues: Avalanche of data (cyber infrastructure)… the IT Leadership Council will be taking up some of these issues and we will be working with them. Data Curation (eScience or data research)…where do libraries fit in to this picture? Preservation – we’re about to launch a pilot to determine the long-term cost of preservation.
HOPS Big Idea: Common User Experience with the goal of providing more universal access to services and collections, Improved Content Delivery…how can we improve this, and Ubiquitous Reference Services.
The University as Publisher is another big initiative. Two initiatives active right now are “Conference in a Box” and Education, including online courses, repurposing content for other audiences and open textbooks. These things are just starting to get going and will talk more about them in the future.
Collections: Open Access—a lot of renewed interest in having OA initiatives go forward, Value Based Pricing (check with Ivy Anderson for more information on this topic), Research Impact/ROI – matching the impact of researching (return on investment research and how to measure this), and Digital Collection Development — do we have a policy…more issues coming up.
Common Themes: Engagement with new partners (with vendors…Google and Microsoft). We’re just learning how to do this. Business strategies — how to be strategic in a business sense. Where can we align? And once again…measuring impact.
Questions? I’ve heard about HOPS big idea, gelling with ULs…. Answer: They gave the ULs a fairly large list and the ULs said to pick a few ideas and flesh these out further.
eScholarship Publishing, including Mark Twain Papers – Catherine Mitchell
See Catherine Mitchell’s PowerPoint presentation for details.
To give you a background on eScholarship…3 tenants to our work…provide low cost publishing, support widespread distribution, engage in fostering new models of scholarly publishing.
eScholarship repository: A full spectrum publishing platform: pre-prints and reports, peer-reviewed articles, etc. The repository is organized by campus, ORU, department. The rate of downloads continue to increase (21,000 documents, 6 million full-text downloads). Direct digital publishing services for faculty, research units, etc.
On the repository home page, you can search and browse across papers, monographs and journals. SF Estuary & Watershed journal is published solely online; it is quite a well respected journal.
We do a lot of collaborative work with UC Press. eScholarship Editions are XML encoded. 2,000 backlist monographs. Monographic series: distributed editorial board published in the eScholarship Repository. Finally, Mark Twain Project Online… Digital critical edition of Mark Twain’s works. (http://www.marktwainproject.org).
The home page currently offers only the letters. Our next release, in December, will include some of the literary work. The material is incredibly complex…we wanted to give people all the abilities of scholarly research without overcomplicating the interface.
Question: The strike-thrus…are they in the actual letters. Answer: Yes, they are meant to be there. The point of the Mark Twain papers is to put forth the most accurate content. The notes are asynchronous…we have words that are highlighted and you can click on the word that takes you to the note.
We have a complex search that allows people to do as much or as little as they want. We also have a faceted browse that allows for serendipitous results.
Often, all you can cite is the URL for digital management. We created a cite-widget that is interspersed throughout the text that creates an automatic citation. You can choose between the short and long citation formats and you can email these to yourself.
Thinking about how we serve the community…there were some surveys done among the faculty (working on digital scholarly projects).
Our next phase…focus publishing services on the articulated needs of the scholarly expertise writ large and small, respond to disciplinary difference, validate and develop scalable services for non-traditional scholarly publishing efforts, listen and communicate.
eScholarship Publishing Services: Do you want to launch a low-budget journal or manage conference proposals and publishing proceedings, etc? We also support the creation of disciplinary collections, enable new and emerging kinds of scholarly publication and help faculty manage the annual biobib requirement, maintain a scholarly homepage.
Question: The application that you showed…did you build this from the ground up? Answer: The Mark Twain Project uses an open source application.
Question: With the tenure issue/scholarly publishing, will CDL be participating in the Bamboo project? Answer: Yes, there’s a lot of potential there and the need to focus on the community.
Question: At a NISO forum, there was a lot of talk about the ePub standards and Adobe Digital Editions? Is this something you’ll more toward? Answer: It’s something we want to be aware of.
Question: Faculty alerts…someone thought it was SPAM…is there any way to give the library a heads up? Answer: Yes, Gail Persily has already contacted us about this.
Question: Preservation question…digital dissertations… Answer: The workflows on campuses are distinct, complex workflow, we’re committed to doing this but we’re kind of in a holding pattern now.
Question: Bio pages for faculty… Answer: Selected Works integrates with the repository where authors can set up pages. We haven’t made it live because we wanted to do some user testing. For example., someone said this would be great for graduate students (university-sponsored page with all their works, publications)…but we need to do some testing.
Question: What analytics tools do you use? Answer: We use Urchin. We just implemented Urchin about 6 months ago.
Web Archiving Service & Digital Preservation – Tracy Seneca
See Tracy Seneca’s PowerPoint presentation for details.
My background…I used to work for UC Berkeley for about 10 years (bibliographic instruction), rights management, etc. I look at everything through a public services lens.
What web archiving is and why it is important. I’m not talking about downloading individual files, but using automated mechanisms to download en masse. Then, build some kind of collection composed on more than one site. We are intent on preserving captured content; results are searchable.
The Web at Risk is looking at public material at risk. Starting with the hierarchy… There is a vulnerability of: digital publications, web publications, government web publications, local government web publications.
Almost every field is starting to publish studies about how quickly citations to web materials become obsolete. This is important to every field of study.
In addition to this, government information librarians are having difficulty knowing what the government has been published. When government agencies produce a PDF and publish directly to their web site, this is not recorded anywhere.
The scope of the WAS grant (Jan 2005-June 2009) is to build tools to allow librarians to capture, curate, and preserve web-based government and political information. If the web site itself is not of interest, but their publications are, to be able to pull documents from these sites.
We have done a lot of assessment; almost the entire first year was talking to focus groups, public libraries, academic libraries…what are the problems with preserving web materials. With each pilot release, we do follow-up with these curators.
The Library of Congress, the Internet Archive, CDL, and the University of North Texas are targeting the legislative judicial branches, broad scale…the CDL and the University of North Texas will be doing targeted depth crawl. This will be brought together into a collection housed at 3 sites. This project alone will bring in at least 3 terabytes of information.
The project Partners in this grant, Library of Congress, New York University, CDL, University of North Texas, etc. Our curatorial partners are building collections; these are all available on our wiki along with assessment materials. There are 19 collections under construction now. We have a lot of CA collections; we have curators working on targeted water resources collections, political blogs, Middle Eastern political websites, southern California wildfires collection.
When the grant ends, this service does not end so we’re establishing this service for the University of California. At this point we’re working with 37 curators, who want to collaborate and build shared collections. So the ones working on water resources are interested in the sites in their areas, but this will be used to build a joint database.
Open source tools with our own curatorial interface. Our next release focuses on the rights management feature and public access features. In the next year, production and development become available side by side. By early summer 2008, 24/7 access will be available to curators.
Here’s our workflow…you log into a project, define sites to capture, run single or multiple captures of each site and then choose which results to add to a single, searchable collection.
Main page as you come in…you see this workflow represented. For the things that are coming up…As you create a site, you give it a name and a seed URL, do you want just this page, the directory or the entire site? You can decide if you wanted to capture any links from the page.
If you are monitoring an agency’s site and documents, you might not want to capture the links out from this page.
You can capture daily, weekly, or monthly.
Question: Do you capture incremental captures? If you capture it today and the site hasn’t changed the next day,… Answer: At the present time, we can’t do duplicate reduction. This is a very hard nut to crack. But yes, this will be critical.
The descriptive data is going to be the focus of our next release. When we asked people what descriptive data they wanted, we didn’t get any answers the same. They ranged from 4 elements to 50.
Once you’re created the record of the site, you can capture it (you can schedule it or a do one-off capture). Our crawler does capture these site by site. Once you’ve hit the capture button, you get some feedback on the capture process. Then, when you’ve gotten the notification that the capture process has completed, you can go look at the capture.
Capture information: site name, how many times you’ve captured it, how long the process ran, and a link into the content. When you look at the content, you get a little feedback about the site…it tells you if you hit a time limit, what server did the capture come from, the capture file types, error messages. There’s also a search tab there so you can search on the content; you get thumbnails for any images. To view content, just click on the title link…you can toggle back and forth between metadata about this item and the item.
Finally, the collection building…for the California Wildfires, I can add entire captures and just add files to it or I can add individual files. We want tools that are flexible enough to do both of these things.
WAS Features for analysis – it’s impossible to know what a web site contains until after you capture it, tools for understanding the nature of the content.
We have a geographic database of IP addresses. This is where the data is coming from on the Middle Eastern collection. For some collections, this is important and for others it isn’t.
The compare feature…you can see what is different between 2 capture dates. Changes in PDFs are important, changes in HTML not so much so (could be date change).
New publications in capture? Tells you new documents that are available. It also tells you the documents you’ve captured that are no longer available on the web.
Coming soon…how volatile is this site?
Potential: We can now capture the “chit chat” – popular reaction to historic events; how will researchers interact with captured content once it is in an archive (visualization, text analysis); what is the potential, beyond simple search and display?
Question: Will the software tools be available for other institutions to use? Answer: It’s not in our grant deliverables, but we’re working towards this.
Question: What about dynamically generated sites? Answer: This is not an exact science and it is a moving target. As new technologies become available, like password protected web sites, there will always be some things that the crawlers can’t get. These are definitely questions we’re trying to work out.
Question: Could you explain a little about your preservation strategies? Answer: This is to build on top of the preservation repository. Replication issues…the Library of Congress has a copy, the University of North Texas has a copy and CDL has a copy.
Question: I was wondering about the copyright issues of capturing web sites in tact? Short answer: That’s our next release in August. There are still complexities with materials within the public domain. It will be up to the institution to make their own policies.
Bibliographic Services – Patricia Martin
See Patricia Martin’s PowerPoint presentation for details.
Patti Martin is the Director of Bibliographic Services at CDL and she’ll be telling us what’s going on in this world.
Melvyl: Completed our upgrade to v16.02; we’re working on backlog of records to load (766,000 records/386 hours); added links to Google Book Search API. Some people have been questioning why we did this (Google Book Search API)? The reason for this is that we have spent many hours and efforts getting our content digitized; this is a way to showcase this content. Our other option was to get all the records back from OCLC, reload to local campus OPACs (times 10), upload to Melvyl, etc. The Google Book Search API was a faster, easier choice.
UC-eLinks: Rebecca Doherty will be the project manager for UC-eLinks and Margery Tibbetts remains our technical manager. Adam Brin will be joining us in June.
July 2007: Newer cleaner menu window layout.
Nov. 2007: Assessment of new UC-eLinks menu window (recommendations to support direct linking to articles; development is currently underway).
Dec. 2007: UC-eLinks in Melvyl.
This year, 9 of 10 campuses are now using the A – Z list for primary ejournals discovery tool; peak use in March was over 1.1 M requests.
The difference in the menu — In the old service, there were input boxes to fill in…we also heard that people preferred groupings so we bolded and relocated some items. Overall, people have been happy with the changes.
Coming soon… Google Books target, new eScholarship journals target and WorldCat Local targets.
Request: Sherry Willhite is the project manager.
Nov. 2007: Revised Request handling of missing items.
Dec 2007: Request began running under https protocol. Jan 2008: Access to “My ILL Requests” service limited to UC IP addresses; users must logion from on campus or via their campus proxy or VPN.
Feb. 2008: Statistics enhancement – stats for My ILL Requests became available.
Mar. 2008: Web-based ILL statistics reporting system with output in HTML.
Jan. 2008: UCB began borrowing on VDX.
Features coming soon… VDX document server hosted at CDL to provide better integration of desktop delivery with VDX. This means that when ILL staff scan a journal article on a server at CDL, we’ll be sending out emails with links to these documents. By having the scanning stations linked to VDX, it will eliminate all the hand keying. Also, the integration of the Request service in the WorldCat Local project.
MetaLib: Women in US Social Movements…we released the pilot last year. Users select resources to search from portal, faceted browsing, etc. Graduate students at UCLA determined what was important for their level of searching. In general, people liked what they saw. The principal investigators have completed their report and it is working through the pipeline.
Next-Generation Melvyl Pilot: In 2008, Next Gen Pilot goes live. The exact date hasn’t been nailed down, but probably in the next few weeks. Why did we get involved? We wanted to move discovery from the local level to the network level; allow users to discover resources beyond our consortial level; scope down to the regional, union view, or local; database size is 100+ M records, growing 10M per year.
We’ll end up with 10 different URLs (campus specific views) and 11th view (Melvyl view). At least 1 major upgrade (in about 3 months), wherein we will add a direct link to the Request feature, which is now only available through the UCeLinks service menu. Links to local OPACs for circ/location information.
With this move to the pilot, we’re looking at a new way of supplying services. I mentioned that for Melvyl V16, it took us 3 years to upgrade this software. With WCL local, there will be constant change.
What about the affiliated and non-UC libraries? I don’t have definite answers for you. It’s just not clear yet how this will happen. If your material is in OCLC, your stuff will still show up in WCL. You just won’t be grouped in the same manner — the local view, the UC libraries, and then the worldwide view.
We need to determine what this means, the decision is technically difficult and there are cost factors as well. I wanted to show you some of the main features of WCL. You also see over of the left side (biography), you can sort by relevance or date. If you haven’t seen any of the other pilot projects, I suggest you look at the University of Washington implementation.
A search for digital content… Here is everything in the eScholarship repository. While Melvyl was also integrating the Google API, WCL was also integrating this. “Preview this item from Google Books”.
Question: If you type “Google books” in the search box, do you get all the Google books? Answer: No, not all the Google books.
You can request via Interlibrary Loan (via UC-eLinks button). I picked this on purpose to show what a work in progress this is. We’re not sure if everyone will understand what all the features are…place hold, etc. There will be 2 campuses involved in usability testing where we hope to learn what works, what doesn’t work, etc., and give that information back to OCLC.
There are lots of ways to give feedback on the pilot, user survey link in the web banner, (get help) feedback links. We want your feedback; OCLC wants the feedback. It’s crucial over the next few months when this goes live to let us know what you like.
On the assessment, it is worth noting that we focused on the needs of the end-user not the advanced searcher. The other piece of this is that when we entered into this pilot with OCLC, we structured this as a partnership. OCLC is interested in learning about the academic environment.
Integration with WorldCat identity service…Charles Dickens, scroll to see number of years when things were published, etc. We get to take advantage of all the services that OCLC creates and that are integrated seamlessly with the product.
With a simple click of the screen, you can change the screen to one of 5 different languages. The best source of information is http://libraries.universityofcalifornia.edu/sopag/uc_oclc_pilot_implementation/.
Question: Several reference librarians with slideshow about next generation Melvyl, a lot of concerns about links to Amazon and why Amazon and not others, like our campus bookstores. There was some concern that people might buy the book rather than use Interlibrary Loan?
Answer: ULs debated this question. The Imp Team had the same concerns. OCLC is interested in adding additional links to local book stores…our users go to Amazon anyway and we won’t know if this is a good feature or a bad one if we don’t try this option. We did a focus group on Shared Print; we asked faculty if they cared where a book was located…how did you find a book? The go to Amazon first, see the features “See Inside” and then go back to their campus OPAC to see if the book is available. It’s a complicated issue. What we tend to find is that our users are really smart.
Comment: I like the book covers. Answer: That was part of the decision, if we turned it on, we got the Reviews and the covers.
One of the other comments…the reviews would take away from the scholarly nature of the catalog, like if someone shared “Awesome”.
Comment: At UCSD, we’ve already done 6 classes for our staff. How do you get a MARC record? Answer: You can’t. Not in this interface. Overall, people seem to love it. We were also told that along with the Amazon link, we could add a link to our local book store.
Patricia: Bibliographic control discussions…records in local OPAC that don’t show up in OCLC? There’s a huge inventory of what records were held locally and what at OCLC. It is an ongoing project to get all the locally held records into OCLC. When the pilot goes live, not all of these will be available. As time goes on, this discrepancy will decrease.
Additionally, there are some categories of records that we decided not to deal with in the pilot. We also added more records in like on-order records that currently aren’t available in Melvyl. We will continue to look at this issue.
Question: Are we still looking at May 19th as the release date? Answer: I don’t know how to answer that. The Imp Team met yesterday then it’s off to the Exec Team and they’ll meet Monday and let us know.
Question: Is any campus creating handouts? Answer: There is a guide that OCLC has produced; it is generic so it’s not tailored to UC.
Comment: Our campus is taking things from the launch kit and when our fall semester begins, we’ll do another round. (UCM)
Comment: UCI is doing bookmarks; we’re inserting it on most of the active search pages of the site. We’ll have a meeting on Monday to start releasing this to the librarians. Maybe this summer, we’ll start teaching materials.
Digital Special Collections – Rosalie Lack
See Rosalie Lack’s PowerPoint presentation for details.
Digital Special Collections (access; create and support end-user sites such as the OAC, Calishpere and the UC Image Collections) and Data Acquisitions (ingest; contributor relations; mass digitization projects) were folded into Digital Special Collections.
Some things we’re working on…
The Collection Development policy for the objects that go into the OAC, we don’t really have one. We’ll be working on this during the next year.
Another thing we’ve been asked to research is a cost recovery model (for non-UC contributors)….what are the costs involved for digital objects, staff time, etc.
We’ve just updated contributor documentation. Check out the web page at http://www.cdlib.org/inside/projects/dsc/contribute/.
Expanding the tools that people use to give us content. We really want to start in a new mode of offering the campuses more tools to give us content (MOAC toolkit). We want to support more of these.
Investigate contributor training possibilities; how can we help you help us?
New content types: Search within PDFs and audio files.
Zoom and Pan Feature: Tiffs provided by contributors make this available.
Better integration with other CDL systems, especially with the Digital Preservation Repository.
OAC: 9,000 online finding aids, over 100 contributing institutions from across California
OAC Redesign: http://www.cdlib.org/services/dsc/projects/oac_redesign.html
This web page includes information on the redesign timeline, why the redesign, etc. The prototype will be released in Sept. 2008 and will be fully functioning. We want feedback and then we’ll fix the bugs during the fall. Live launch is expected in Feb. 2009.
The redesign will improve finding aid navigation and display; users will clearly understand what is available online and what is not, the purpose of the site and of finding aids, locate information to contact the relevant institution, etc.
New and improved…collection name, number and repository address will be visible at the top. A dynamic area will change based on clicks in the sidebar. A PDF version of the finding aid will be available.
Question: Are the finding Aids EAD encoded? Answer: All the finding aids are EAD encoded. We’ll also be making MARC collection and item records available. This is new and very exciting. You’ll see on this slide that we’ll have date facets.
This is the item page — You’ll have images online, text, this is the offline item icon (like scrapbooks)…that you’ll have to go to the collection to see. We probably won’t call it ‘offline’.
You won’t see the huge backend infrastructure updates. You’ll see a modernized display, upgrade to features/functions that are now common on other web sites. The Google map feature will help reinforce that users often need to go to the repository.
Question: The DAO element? Are you using this? Answer: Yes, I’m pretty sure we’re using this.
If you clicked on the Bancroft Library…right now you get a page like this, you just don’t get a map. The bottom of the Finding Aids screen will have a comments feature. We haven’t decided if this should be moderated. Users will have to sign in. There will also be more contributor tracking tools.
Calisphere and OAC:
Online Archive of California (OAC): Archivists, historians, researchers
Calisphere: K-12 teachers, general public, undergraduates
Calisphere: Over 200,000 digitized primary sources, from UC campuses and institutions across CA, serves the general public, and are tailored to meet the needs of K-12 educators.
JARDA: More than 10,000 photographs, oral histories, and other documents on the story of Japanese-American internment.
California Cultures: Content from the 4 major cultural groups of California (African, Asian, Hispanic, and Native Americans).
Calisphere receives approximately 140,000 visits/month (includes site visits and image views). These are open to Google Images. 50% of traffic comes from Google Images, Google primarily.
Previously, our most used images of nude bathers from the 1920s…now our most highly viewed item is Mariano Vallejo’s personal accounts (California history).
K-12 Activities: Build more themed collections; marketing (increase usage b y K-12) including participation at conference presentations and booths, workshops, video presentations.
UC Image Service: Shared collections hosted in ARTstor, available to UC only. Make digital images for teaching broadly available for faculty and students UC-wide.
Collection development program proposal to CDC (Phase 1: Local campus collection development, Phase 2: systemwide steering/advisory group for collection development). Visual Resources curators to build collections for fall 2008, work with CDL DPR on preservation, assessment.
Questions: Do you still have some promotional materials for Calisphere? Answer: Yes, we have all kinds of stuff. Lorna Lueck is doing a teacher orientation for teachers coming into a credential program. Rosalie Lack: Send me an email with numbers and we’ll get them out to you.
On the Year in Review, you’ll see some quotes that reflect the value of this service. The faculty member who is the Director of the Language Institute told Laine that ‘once again the UC libraries made him proud to be a part of UC’.
Mass Digitization Projects – Heather Christenson
See Heather Christenson’s PowerPoint presentation for details.
As of fall of last year, I became the Mass Digitization Project manager. What I’m going to talk to you about today is where we’re going, what we digitize, etc.
3 Projects, 1 Goal
Goal: Mass digitization of UC Libraries’ book collections.
Google: In-copyright & out-of-copyright works, available via Google search engine.
Microsoft: Out-of-copyright works only, available via Microsoft Live Search Books
Open Content Alliance: Out-of-copyright works only, available via the Internet Archive website to any and all search engines, library and grant-funded.
Why are they doing it? Google’s vision: To put all the world’s’ information online.
Google & Microsoft: To gain market share and competitive advantage for their search (and online advertising) services. “It’s all about Search.”
OCA: To put the world’s information online, for free, forever. “It’s all about the public good.”
Why are we doing it? Create the ability for anyone to discover & access books anywhere, anytime essentially for free; new kinds of scholarship; preserve and protect our collections; to explore new collection and access models.
Participant roles: UC Libraries – supply & curate books and bibliographic metadata, supply onsite scanning facilities when appropriate, preserve digital files created.
Microsoft/OCA: Scanning began in April 2006 (books from all the UC libraries), Internet Archive is the digitization agent. This is a pick-list driven approach: limit to public domain. Scanning centers (30 scanners “scribes”) are located at SRLF and the Internet Archive.
Google: Began in October 2006, scanning books from NRLF. Shelf-clearing approach (both public domain and in-copyright books are scanned). UCSC and UCSD have been sending books this year.
CDL’s role: Liaison with partners, planning & coordination, funding, stewardship of digital content, new services (API into Melvyl, working with OCLC).
Here are the stacks at NRLF. First, they evaluate condition of the book, then the book is checked out (large groups of books can be checked out at the same time). The books are moved to carts and then trucked to Google or sent to the Internet Archive scanning machines. A human being turns the pages and takes the images. Here’s the book cradle that places glass over the pages to be photographed.
Some books cannot be scanned if they are too brittle or the binding is too tight. Internet Archive is about to start scanning foldout pages. Then the books go back and are checked back in.
Costs to the UC Libraries: Staffing, physical space and facilities, CDL servers for inventory database and digital preservation.
What do we get back? We get back the images themselves, the OCR Text, the OCR Page coordinates and metadata.
What books are being digitized? American history, humanities, science, cookbooks, children’s books, East Asian & Pacific Rim collections.
Where can you find UC books? Google Book Search: http://books.google.com/
Microsoft Live Search Books: http://www.live.com/
Internet Archive: http://www.archive.org/details/university_of_california_libraries
Full text access: copyright status is a factor — public domain, pre-1923; “orphan works”, 1923-1964; 1965-present.
What are the strengths and weaknesses of leading book discovery interfaces? Improved results and ranking and recommendations, ability to both browse/winnow and search across full text, ability to find and display multi-volume works in a meaningful way.
The Expresso book machine prints out a book, on demand in about 15 minutes. What does this mean for us as libraries? By participating in these projects, we’re at the table and making use of these new services and technology.
Question: Do you know yet how you’re going to retrieve the digital files from the Internet Archive? Answer: We do have a tool that they (IA) offered to us (Meta Manager); this is being upgraded right now.
Question: UCSD and UCSC are participating…are other UC campuses coming online? Answer: The course is set by the ULs. The likely process is to look at what is happening now and then the ULs will decide where we go next. UCLA is on-deck next for Google.
Question: Is there any plan for any campus to acquire one of the Expresso machines? Answer: UC was offered a machine to host at UCLA , but at the time it was considered to be too expensive. But they are getting cheaper. We’re now talking about a $30,000 machine instead of a $100,000 machine. Print on Demand is available online from other sources.
Question: Whenever I hear Google talk about mass digitization…will they go back and look at the exceptions (like oversize books)? Answer: I think they are eager to broaden their possibilities. Laine Farley: They are just very good and starting and then going back and refining. Some of the complexities are really major so they opt to not deal with the difficult tasks. Heather Christenson: They have the resources to solve it, it’s just a matter of when they get to it.
Question: Do you know how many times users have discovered a book in Melvyl and then found online copies from one of these services? Answer: We don’t track once users leave the Melvyl interface. One thing I didn’t mention is that CDL is building a database to track the mass digitization process (where are the gaps, what types of books have been left out, etc.). That’s coming online some time this summer.
Question: The “Orphan works” are any of these being digitizing them now? Answer: Google is digitizing them now. We can’t make them available because of the copyright status. Stanford and others are working on determination of copyright status for books in the 1923-1964 timeframe (the “orphan works”) and we plan to participate in the trials for a pilot of the OCLC registry of Copyright Evidence. Hopefully we will ultimately be able to place more books in the public domain this way
Question: What’s the latest on the lawsuits with Google? Answer: We don’t really know. There hasn’t been much news about it.
Round the table – All
What are the most important initiatives taking place at your library?
Why do we like working at CDL? We have great projects and great colleagues. I want to thank the speakers who put these presentations together. And Nancy Scott-Noennig who handled the logistics and Nadine Graham and Jayne Dickson.
Mitchell Brown, UC Irvine: UCI finished licensing My iLibrary as the aggregator for our ebook collections. We license packages and load individual records into the system. We’re looking at licensing an Oxford package for English literature, Royal Society of Chemistry Archive.
We’ve starting to look at interfaces for web 2.0 technology (dynamically generated user guides). We wanted something that wasn’t the traditional flat study guide; you can attach videos, widgets, etc. This was a sandbox approach. Our campus is looking at licensing a content management system. This is forthcoming. We’ve got some campus digitizing projects we’re working on. Hopefully, by next year, will have our first local collection available.
Lorna Lueck, UC Santa Barbara: We are purchasing ContentDM to manage our digitization projects. Our Special Collections Department has digitized over 5,000 cylinder recordings from the mid 1890s to the mid 1920s (downloadable to MP3 and streaming online). The site is available at http://cylinders.library.ucsb.edu/. Special Collections is also featuring an exhibit, Sounds Latino!, on Latino music legends in the California Ethnic and Multicultural Archives (CEMA). The exhibit includes 41 music selections digitized from 78 rpm records and tapes. Visitors can look at the visual collections and then call on their cell phones to listen to the music. For the phone number and list of songs, visit Cemaweb at: http://cemaweb.library.ucsb.edu/cema_exh_present.html.
Michael Oppenheim, UC Los Angeles: The Management Library is becoming a “21st century transformative learning space,” giving over the top floor of the library to the students of the Anderson Graduate School of Management (for study pods and a presentation room), and most of the second floor (for more study pods). We’re moving other collections to other floors. Last fall, we moved our reference desk out of the reference area, and relocated it across from an alternate library entrance that has 3 times the foot traffic our main entrance does. Since Oct., we’ve stopped using our reference collection entirely, and we do everything digitally. Since we moved our reference desk, our statistics have skyrocketed. In late February, the “Common Desktop Initiative” was implemented throughout the UCLA libraries. At public workstations, faculty, staff and students can access the same software available across the campus (complete Microsoft Office suite, etc.). There is some concern about how this change may affect the use of workstations because walk-in users are still permitted.
We’re coping with three major retirements in June: AUL for the UCLA Electronic Library Terry Ryan; Head of the Science and Engineering Library, Audrey Jackson, and Claire Bellanti, Directory of Library Business Services.). Recently we hired a preservation librarian, Jacob Nadal, who joins UCLA in early June, following a position at the New York Public Library.
Kris Veldheer, Graduate Theological Union: Since we serve 9 separate campuses, GTU is going thru a transition. We’re taking on the archival records of a number of our member schools. In the same breath, we’re looking to digitize these resources. We’re also looking at re-tooling and giving ourselves a greater digital presence. We’re buying more databases through SCELC and buying MOODLE.
Sara Davidson, UC Merced: Our library is doing a website redesign and a company is setting this up and working on the back-end and trying to make our site more dynamic. A research blog where students can ask questions out in the open. Assessment – collaborating with the writing program with a goal of reaching all the students who take Writing 10 and using RefWorks, which is the citation management system we subscribe to.
We’re also collaborating with a faculty member on hosting a virtual reality space in the library. Also, we’re looking at eScholarship to highlight with a virtual journal work the undergraduates are doing.
Isom Harrison, Lawrence Livermore Nat’l Lab: We’re digitizing our unlimited, unclassified reports collection and making it available to the public, but after 9/11 we had to pull some back. We have a real space problem, we get charged for space. We’ve also had cuts. We’re working on re-purposing our space and moving on. Some managers want to get rid of all the bound journals, but we have quite an investment in that.
Jose Olivares, Lawrence Berkeley Nat’l Lab: We’re about in the same boat [as LLNL]. Recently, we’ve finished a new research reports submission system that collects bibliographic data and full text research reports. We have a catalog of bibliographic data about research reports going back to 1939. The old hard copy reports are being digitized. We’re also looking at our photo archive collection and trying to digitize as much of this as we can. One area that no one is looking at is the preservation of data sets and this is being lost.
Melissa Browne, UC Davis: Our technical services departments are undergoing a reorganization. They are changing their structure from a format-based model to a function-based one. The new departments will be responsible for acquisitions and cataloging.
On the public services side, we have 3 librarians who will be attending the ACRL Immersion program this summer in San Diego. Like other places we’re coping with budget cuts.
Rebecca Morin, California Academy of Sciences: We’re moving…construction of the new building. We’ve moving to a purpose built space back in Golden Gate Park. There are about ¼ volumes (that’s ¼ of a million volumes) have to be moved from Howard St. to the Park. We’ve temporarily suspended several services (ILL); we hope to be available for library services on July 1 and to the public on Sept. 28. For the first time we have a dedicated digital lab and staff. We’re working on digitizing all Academy publications, a joint venture with IA. You should be able to find 350 records with CAS on IA.
Ira Bray, California State Library: The State Library just signed a letter of intent to get off DRA Classic and move to Ex Libris. Personnel changes: Tom Andersen has moved to State Library Services as Bureau Chief. Gerry Maginnity is the new Bureau Chief for Library Development Services. We’re just winding up decisions for the Library Services and Technology Act (LSTA) grants for FY 2008/09. We’ll get these funded just as soon as the state budget passes. The State Library will be undergoing a 10% cut just like everyone else. Our historic Library and Courts I building is being retrofitted so our collections will be moving in 09/10.
Marcus Banks, UC San Francisco: Similar theme as UCLA, the second floor of our journals will be moved so that the clinical simulation center can be moved in. We’ll be re-purposing our teaching lab. We’ll have a new content management system for our web site, and will refresh the look and feel of the web site.
Charleen Kubota, UC Berkeley: I’m happy to report that UCB has just completed a 9 month discovery process which is the first phase of a three stage New Directions Initiative. This is an exciting grassroots process designed to enable the Library to “understand and adapt to the evolving information needs of our users”. There have been a series of town hall meetings with presentations by speakers from academia and the private sector. Links to presentation webcasts can be found on the New Directions website and can also be found by searching “New Directions Berkeley” on You Tube. Library staff conducted interviews with faculty and students and used their feedback to inform the “next steps” area. The New Directions Steering Committee identified 165 tasks of which 26 are considered “key” or “quick impact” starting points for UCB. The hot action items are e-science, e-everything; assessment; copyright; culture/innovation; digital library for managing assets managed or licensed by the Library; digital preservation; develop the hybrid/integrated library professional; create a Discovery team to “implement next generation discovery tools”; assess library spaces; marketing library services; support faculty publishing and digital collection building; and integrating on-going training into the Library’s culture. You can track the progress of this process at the New Directions website or submit comments to the New Directions blog.
Berkeley has been moving forward with some innovative projects. The Berkeley Research Impact Initiative (BRII) pilot project subsidizes fees charged to authors who want to publish in open access or paid access publications. We joined the Ask A UC Librarian chat reference cooperative last fall. The “Nuggets of Innovation” project list is on the New Directions website.
Leah Prescott, Getty: Sloan Foundation funded mass digitizing by the IA, OCA. We are digitizing books on Pompeii, Herculaneum, and general books on antiquities, archeology. Right now we have about 1,800 books available. When the project is finished we will have upwards of 4,000-5,000. We’ve just had a scribe scanning station installed onsite to handle our fragile books. 2,000-3,000 Julia Schulman photographs — we’re going to pull this from our digital management system and submit these to Calisphere.
We’re in the process on implementing DigiTool which is how we’ll be creating our METs records for Calisphere. We’re going to be working with the supercomputer center in San Diego on their iRODS project. Last year, as part of our scholar year, we digitized a definitive 18th century work on comparative religion published in four languages (9 volumes in just the French version). At the end of that scholar year, we created a Confluence wiki where the scholars could collaborate on research into the origins of the illustrations, and into the differences between the different language editions. We have transferred this wiki over to UCLA. They’re doing TEI markup of this. We’re working on an ebook application for an exhibit of Russian Avant-Garde books and for other exhibitions and uses in the future. We have a California video exhibit up right now, which meant doing a lot of digitization of video. We’re creating a dedicated video viewing room. We’re dabbling with audio files (working to set up podcasting system). PennTags software has been installed and we are developing a research strategy to compare social tagging with traditional cataloging, and where the Getty vocabularies might fit.
Penny Coppernoll-Blach, UC San Diego: Redesigned our public website; Google Book Search Library Project, working with the archivists toolkit project, NIH publication requirements, doing a lot more with Facebook to reach our medical students, doing more wikis and blogs. Our Social Sciences & Humanities Library weeded out 70% of their reference collection. We’re taught 6 classes of Next Generation Melvyl for our library staff.
General questions; wrap up– All
Please bring any comments or questions on any of the topics above, or on other CDL products and services.
Question: Wikipedia entry, UC-eLinks button? Answer: It’s really the use of the Firefox extension and Wikipedia. We’ve been loading Calisphere content into Wikipedia.
Comment: They were disappointed that the Resource Liaisons meeting wasn’t held this year. It would have been an excellent time to thank Terry Vrable for all her hard work. Response: We’re recruiting for Terry’s position; we also were disappointed not to have had a Resource Liaisons meeting.
Comment: Additional EBSCOhost databases, you didn’t mention Business Source Complete? There is a great deal of more content. Response: I will send out an addendum to this.
Comment regarding ARTstor from UCI: One group wanted the ability to ingest their own images (materials science). If you want details about specific crystals, they’ve been very receptive about this that the interface is easy to use and store. There’s also a section in Chemistry…This is an example of repurposing the content in lots of different ways.