Merritt Service Update: July 2014
Recent Enhancements, News, and Activities
- SDSC update. We have successfully resolved all the issues with the storage node at the San Diego Supercomputer Center (SDSC) Cloud Storage service, and are now sending all newly submitted content to SDSC. We are working on a plan to migrate all the content currently stored in the UCOP Data Center to SDSC. We’ll be sending out more details shortly.
- Replication at UCLA. We working on replicating all of our content at the UCLA Cloud Service. We will be planning for this in conjunction with the content migration to SDSC.
- Problem with UTF-8 chars in pathnames resolved. We encountered a problem with object components containing UTF-8 characters in the filename or pathname. We found 27 objects that could not be retrieved due to differences between the filename or pathname in Storage, and how it was represented in Inventory. In some cases, a question mark was substituted for a multi-byte character. We discovered this problem through routine audit checks that we conduct continuously. All of problems with the objects we identified so far have been addressed. Please let us know if you encounter any problems.
Details about our process:
- Our first step was to ensure that we had consistent UTF-8 support throughout all of Merritt’s servers and services, including ingest, storage, inventory and the user interface (UI). Any content submitted with UTF-8 chars in the filenames will be handled appropriately, and will not cause any problems.
- Then we worked to fix the objects that couldn’t be retrieved. Working with curators, we discovered that in some cases, the filenames had actually been mangled by other systems prior to Merritt submission. In other cases, we were able to fix the objects so that they could be retrieved.
- Good practice for file naming. If curators have control over file naming, it is better on the whole to use an unaccented ASCII character set for filenames and pathnames. There will be cases where curators are submitting content created by others, and Merritt can support them. But even though the Unicode standard was first published over 20 years, general support for Unicode character sets in various software and operating systems remains problematic. For instance, once you download an object from Merritt, you might have trouble uncompressing with some Zip or GZip clients if the filenames contain UTF-8 characters.
- Fixed 32-bit limit on tar. We discovered a bug that prevented anyone from downloading very large objects. Large objects can take a long time just to package in preparation for downloading. Rather than make users wait, we prompt them for an email address and contact them to let them know when their objects are ready to download. We found that objects larger than 8 GB were failing because of a configuration limitation with the Java library for compression that we were using. This is now fixed.
Merritt Service Description
Merritt is a production level service that provides the UC community with an easy to use tool to manage, archive, and share their content. Content can be deposited and managed via a user-interface or an API.
Merritt Service Manager
- Perry Willett firstname.lastname@example.org
Please contact us with any questions or general correspondence.
Merritt Training Materials, Guides, FAQs and Webinars
Service Monitoring and Availability
Check CDL’s system status page at http://www.cdlib.org/contact/system.html.