CDL Releases eXtensible Text Framework (XTF) Version 2.2
By Elise Proulx
Publishing Group Outreach & Marketing Coordinator
The Publishing Group of the California Digital Library (CDL) announces the release of version 2.2 of its search and display technology, the eXtensible Text Framework (XTF).
XTF is an open source, highly flexible software application that supports the search, browse, and display of heterogeneous digital content. XTF offers efficient and practical methods for creating customized end-user interfaces for distinct digital content collections.
XTF 2.2 Release Highlights
- “Sub-document” feature for fine-grained searching of structured texts. Useful for documents with many unrelated parts (e.g., books of poetry).
- Index enhancements, including:
- Simple index validation, including number of hits on a given page or in the index. Validation helps ensure that only high-quality search indexes reach the end user.
- Support for index rotation, to prevent the display of partially completed indexes.
- New indexer option (“-force”), to force indexing regardless of whether objects have changed.
- Efficiency improvements, including enabling of profiling for search results stylesheets and new caching options.
- Enhanced PDF support, including stylesheets generating a PDF file instead of an HTML page or appending to an existing PDF file using XSL FO (Formatting Objects) with the Apache FOP library.
- Maintenance of XTF source code in Mercurial (hg), easing development and enabling XTF users to maintain a unified code base with the primary XTF code in the default branch and changes in another – allowing for more seamless upgrades.
- Support for ERC metadata output from crossQuery.
- Support for OpenURL resolving in crossQuery stylesheets.
XTF version 2.2 is now available for download, along with viewable documentation and a complete list of upgrades, on the XTF Project page on SourceForge.
A Powerful, Flexible Access Platform
Commissioned by the CDL to be the primary access tool for its collections, XTF provides a powerful, flexible platform for providing access to digital content. It consists of Java and XSLT 2.0 code that indexes, queries, and displays digital objects.
The software allows end users to:
- Search using Boolean commands, truncation/wildcard operators, and exact phrases.
- Perform structure-aware searching (e.g., search only this chapter) and view search terms in context.
- Browse hierarchical facets.
XTF provides developers the following benefits:
- Simple deployment: Drops right into a Java application server such as Tomcat; has been tested on SunOS, Linux, and Windows.
- Easy configuration: Can create indexes on any XML element or attribute; entire presentation layer is customizable via XSLT.
- Robustness: Optimized to perform well on large documents (e.g., a single text that exceeds 10MB of encoded text); scales to perform well on collections of millions of documents; provides full Unicode support.
- Works well with a variety of authentication systems (e.g., IP address lists, LDAP, Shibboleth).
- Provides an interface for external data lookups to support thesaurus-based term expansion, recommender systems, etc.
- Can power other digital library services (e.g., OAI-PMH data provider that allows others to harvest metadata, SRU interface that exposes searches to federated search engines).
- Modular components can be deployed as separate pieces of a third-party system (e.g., the module that displays snippets of matching text).
National and International Use
XTF is a widely used technology, both within the CDL and worldwide. A small sampling of its implementation:
- eScholarship access interface (http://www.escholarship.org), developed by the Publishing Group of the CDL.
- Mark Twain Project Online (http://www.marktwainproject.org), co-developed by the Mark Twain Papers Project, the CDL, and the University of California Press.
- The Encyclopedia of Chicago (http://www.encyclopedia.chicagohistory.org/), co-developed by the Chicago History Museum, The Newberry Library, and Northwestern University
- The Chymistry of Isaac Newton (http://webapp1.dlib.indiana.edu/newton/) and The Swinburne Project (http://webapp1.dlib.indiana.edu/swinburne/www/swinburne/), Indiana University.
- Frontiers of Science (http://frontiers.library.usyd.edu.au/), University of Sydney Library.