DLF logo DLF logo

About

Architectures, systems and tools

Digital preservation

Digital collections

Standards and practices

Use and users

Roles and responsibilities

DLF Forum

Publications and resources

Search CLIR and DLF web sites

University of California, Berkeley

Report to the Digital Library Federation
Fall, 2003



Table Of Contents



I. Collections, services, and systems

II. Projects


I. Collections, services, and systems


A. Collections


The UCB Library has created or collaborated in the creation of nearly 200 collections of digital content. An inventory of these collections, with brief descriptions and links to the collections themselves, is maintained by the Digital Publishing Group (DPG), a unit of the Library Systems Office. It is available on the DPG Web site at the following URL:

http://dpg.lib.berkeley.edu/scripts/webdb/dpg/collbrw.plx


B. Services


Remote access


The UC Berkeley Library provides a proxy service by means of which members of the UC Berkeley community can access licensed content from off-campus locations. Users authenticate using either their library card number and a PIN, or their CalNet ID and passphrase. CalNet authentication is the campus standard for Web applications and uses the UCB Authentication Web Server, a Kerberos-based service developed and maintained by the Systems and Network Security group. The Library’s proxy service is heavily used, handling several million documents per month.

http://proxy.lib.berkeley.edu


C. Systems


GenDL

The UC Berkeley Library began implementing GenDL, its digital library support architecture and system, in conjunction with the DLF sponsored Making of America II initiative on which it was the lead institution. This initiative developed the first standard encoding for library digital objects (the MOA2.DTD) against which the library could develop such a suite of digital library tools. Now that the METS schema, which builds on the MOA2 work, has superseded the MOA2.DTD, the UC Berkeley Library has been working to convert its MOA2-based tool suite to produce and present METS objects. It has also been enhancing GenDL in ways that are independent of the core encoding standard.


The GenDL tool suite consists of 4 components: GenDB, for gathering the raw structural, descriptive and administrative metadata pertaining to digital materials, GenX for creating standards-based digital objects from the raw metadata, GenSearch for searching the library’s repository of digital objects, and GenView for viewing and navigating the library’s digital objects.


As first implemented, GenDB was built on MS Access, and the Library Systems Office produced a separate Access-based GenDB database for each digitizing project and project participant. Over the past 2 years the Library Systems Office has worked to implement a version of GenDB with a centralized database, and an intuitive and highly configurable Web interface. This work is now substantially complete, and the OAC administered California Cultures project is actively using the new version of GenDB. The University of California History Digital Archives, and Environmental Design Archives will begin using this new tool over the next couple of weeks. As Finding Aid support is added, and project management features are improved, WebGenDB is expected to support all UC Berkeley digitizing projects. WebGenDB/GenDB is largely neutral with respect to encoding standards. However it has also been adjusted better to support METS, MODS and MIX output now that these are emerging as the primary standards for target encodings.


GenX, which extracts the raw metadata from the GenDB database to produce standards-based digital objects, previously produced MOA2 objects. A recently revised version of GenX now produces METS objects, with MODS encoded descriptive metadata, and MIX encoded image technical metadata. The new GenX cannot be finalized, however, until appropriate schemas for technical metadata and rights metadata emerge and/or mature.


GenSearch is the least developed component of the Library’s tool suite. The Library Systems Office has performed fairly extensive testing of an XML database (Tamino) as a possible support for digital library searching functions. However Tamino’s support for XML Schema in general, and METS and MODS in particular, is currently immature and requires that the schemas be rewritten to be loaded. Recent GenSearch research has focused on Cheshire II, and the forthcoming Cheshire III.


GenView has provided library patrons with the ability to view MOA2 objects since the conclusion of the MOA2 initiative. A new version of GenView recently put into production supports METS objects as well as MOA2 objects. Library Systems staff members have also been working substantially to redesign the GenView interface to make it more intuitive and user friendly. It will begin implementing these enhancements in the very near future.


XML Search Tools

With the increase in number of digital collections hosted by the UCB library, providing our patrons with a reliable and efficient search tool has become more important. It has also been essential to find one compatible with the METS standard now used to describe objects' technical, administrative, and descriptive metadata. The SAG XML database Tamino was tested for the purpose, but its inability to handle the full METS schema without modifications prompted us to look at other products, including Cheshire II, an XML search engine developed at UCB’s School of Information Management and Information Management and Systems by Ray Larson.


Tests performed to date include the indexing of a million MODS records, which showed a excellent performance for retrievals. The user interface API made it easy to create a web search page. Compatibility with METS was also good. Particularly interesting is Cheshire's support of both Boolean and probabilistic "best match" ranked searching and browsing. Unicode support is not available yet in Cheshire II, but it is expected in the next release, Cheshire III, which is currently under development. Additional information can be found at the Cheshire II Web site.
http://cheshire.berkeley.edu/


II. Projects


Digital Publishing Projects

The Digital Publishing Group (DPG) manages and provides technical and other forms of support for a wide variety of UC Berkeley’s digital publishing projects. The DPG maintains an alphabetical list with brief descriptions of current projects at the following URL.
http://dpg.lib.berkeley.edu/scripts/webdb/dpg/project.plx/


LOCKSS at Berkeley


The UC Berkeley Library is participating as a tester for Stanford University’s LOCKSS system. LOCKSS stands for "Lots of Copies Keep Stuff Safe” and the goal of the project is to create low-cost, persistent digital caches of authoritative versions of Web-delivered content. The LOCKSS software enables institutions to locally collect and archive content, while enforcing the publishers’ access control systems and without harming their business models. The accuracy and completeness of individual LOCKSS caches are assured through a peer-to-peer polling system which is both robust and secure. LOCKSS replicas cooperate to detect and repair preservation failures. LOCKSS is designed to run on inexpensive hardware and to require almost no technical administration. The software has been under development since 1999 and is now distributed as open source. Further information is available at the following URL:
http://lockss.stanford.edu/




Please send comments or suggestions.
Last updated: December 14, 2003
© 2003, Digital library Federation, Council on Library and Information Resources

CLIR CLIR Home Page