Merritt Service Update: January 2015
- Merritt experienced some significant problems with ingest and retrieval beginning around January 21. In some cases, objects larger than 100 Mb submitted to ingest simply failed, but smaller objects failed as well. We isolated the problem to the storage server. The storage server is distinct from the actual storage node where the content is stored; the storage server directs ingested content to, and retrieves requested content from, the correct storage node. No content already stored in Merritt was affected, although some retrieval requests failed.We had seen some sporadic failures prior to January 21, but the problem became more significant when we migrated from a physical server to a VM on that date. We determined that the problem was related to the Solaris operating system used by both of these servers.* The problem caused extremely slow I/O, which in turn caused timeouts during ingest. This is why it affected larger objects more often—due to the slow I/O, they were more likely to hit the timeout limit.We originally chose the Solaris operating system for its ZFS file system used with the DFlat file system convention in Merritt. However, since we moved our primary storage node to the SDSC Cloud, we no longer use DFlat. Thus, we no longer needed ZFS or Solaris, and could move to the SLES operating system that we use on our other production servers.
We migrated to the SLES VM on February 5th, and the problems with ingest failures went away. We have also noticed better I/O performance.
We appreciate the patience of curators, who needed to resubmit content that had failed during this period. Please let us know of any concerns or questions.
* For the “insatiably curious” (as our now-retired colleague John Ober used to say) it seems to have been related to the E1000G network interface controller emulator under Solaris VM.
- With this problem fixed and the improvements to I/O, we have begun to replicate content to UCLA IDRE storage. No estimate yet on how long this might take, but we should have an estimate soon.
Merritt Service Description
Merritt is a production level service that provides the UC community with an easy to use tool to manage, archive, and share their content. Content can be deposited and managed via a user-interface or an API.
Merritt Service Manager
- Perry Willett firstname.lastname@example.org
Please contact us with any questions or general correspondence.
Merritt Training Materials, Guides, FAQs and Webinars
Service Monitoring and Availability
Check CDL’s system status page at http://www.cdlib.org/contact/system.html.