DLF logo Front Page
Printer-Friendly Page

DLF Home

Editorial Page


Reports to the DLF

Recent and Future Events

Technical Reports

University of Pennsylvania
Report to the Digital Library Federation
October 15, 2000


  • Collections, Services, and Systems
  • Projects and Programs
  • Specific Digital Library Challenges

    I. Collections, Services, and Systems

    A. Collections

    The Schoenberg Center for Electronic Text and Image (SCETI)
    SCETI provides the scholarly community with web access to virtual facsimiles of original texts, documents, and sources from Penn's collections. SCETI now includes eleven specialized collections that provide digital views of printed books, manuscripts, photographs, artwork, maps, broadsides, ephemera, and recorded sound. Founded in 1996, SCETI continues to introduce new methods for producing and presenting of digital resources. Recent additions to SCETI include new text facsimiles for the Furness Shakespeare library, and accompanying multimedia teaching aids that demonstrate the technique and interpretation of Shakespeare's works. SCETI has also recently added several digitized versions of medieval manuscripts (including illuminations) to its Lawrence J. Schoenberg digital collection.

    The Oxford University Press History Books Project
    The Oxford University Press History Books project is making newly issued scholarly history books available electronically to the Penn community. The project is studying digital book use and its impact on teaching, learning, and the economics of publishing. Currently over 100 books are available, with thousands expected by the end of the 5-year study. Descriptions of the project, its catalog, and three sample books, are available for the general public to browse.
    http://verity.library.upenn.edu/search97cgi/s97_cgi?Action=Search&collection=oup &access_type=public&resultcount=5000&OupQueryType=title&ResultTemplate=oup/home3 books.hts

    Freedman Archive
    The Freedman Archive site is an example of the integration of digital and nondigital resources, and of the migration of digital technologies we hope to supporting on a larger scale in the years to come. Users can visit the Freedman collection web site to browse and search a multilingual catalog of over 27,000 recordings of Jewish music. A few of these recordings are available as digital sound samples; the others can be listened to offline at the Archive. The digital catalog, using dBase IV tables with customized character encoding, has been migrated to web-searchable forms with standard Unicode encoding for Yiddish and Hebrew text. We expect to reuse the tools used in this migration for other projects.

    A Celebration of Women Writers
    The Penn Library is also hosting A Celebration of Women Writers, a collection of electronic text transcriptions (many with illustrations) edited by volunteer Mary Mark Ockerbloom. This sister site to The On-Line Books Page has digitally republished over 130 books by women on-line, free for all to read. The Celebration's publications cover English-language women's writings in all genres, but especially emphasized are fiction and poetry, children's literature, Canadian authors, and personal accounts of important historical periods and figures. The site also includes a browsable database of other writings freely available on the Internet by and about women writers.

    B. Services

    Cross-collection Searching
    A cross-collection search service allows users to find digital resources in any of dozens of databases, without having to seek out each individual database, and go through each one's unique user interface. Our first version of this service is now available as QuickSearch, implemented using software provided by OCLC. Eventually we hope to have more powerful cross-collection searches, through this and other tools, with more databases, more flexible interfaces, and the ability to refine searches to specific user needs.

    Electronic Journals and Databases Search Tools
    An electronic journal search tool, made public this fall. gives patrons a much more powerful discovery tool for our electronic journals than our previous static Web pages provided. Since we now subscribe to over 3000 such journals, finding journals that are relevant to a particular field of study can be difficult. The search system allows patrons to search:
    • by known title or keyword in title
    • by community of interest (see below) or by community clusters
    • by journal publishing associations like the Association of Computing Machinery or vendor-supplied aggregations like Journals@OVID
    • by format of journal articles - full text, page images or not full text
    • by access restrictions (useful for non-Penn users that want to see what electronic journals they can look with no subscription)
    • or alphabetically by title

    A similar tool is now under development for electronic databases, using similar searching criteria.

    Electronic Reserves
    Electronic reserves were introduced as a pilot service in the 1999-2000 academic year, for the delivery of course reserve materials over the local network in digital form. We will be supporting electronic reserves on a regular basis starting in the 2000-2001 academic year. The service includes in-house rapid scanning facilities, integration with the Franklin catalog and with on-line courseware, and access control to ensure that we stay within fair-use limits.

    New Library Materials
    Our New Materials notification service, currently under user testing as "New Books Plus", lets users find recently acquired materials according to a variety of critieria preferences (including name, topic, language, format, and date of acquisition of the materials). Users can also view a complete listings of recent acquisitions.

    Try This Out!
    "Try This Out!" is Penn's prototypes page, introduced this summer. Library users can go to this page to get access to services still in development that may be of use to them. They can also give feedback to us and suggest ways to improve the prototypes. Library staff have access to a similar page for projects that we want to test internally but not yet release for public use.

    The On-Line Books Page
    Penn's Library web site now hosts The On-Line Books Page, the Web's oldest Internet-wide index of free-to-read books, edited by John Mark Ockerbloom of the Library. The On-Line Books Page provides a searchable index of over 12,000 freely accessible digital books in English now available on the Internet. It also has links to major electronic text archives, and posts information on how individuals and groups can create more on-line books to add to the growing collection of freely-readable books on the Internet.

    C. Systems

    Persistent References
    Our persistent references work allows us to use more stable, high-level references to digital resources than the fragile URLs of the World Wide Web. We have installed the new version of CNRI's Handle service, and plan to introduce Handles to identify, and help locate and describe, many of our locally managed resources. We are developing software to facilitate the maintenance of locally defined Handles. In the future, we also hope to support persistent references at the citation level (that is, dynamically resolved references based on descriptive information like title, author, and publication details) as well the opaque, one-to-one identifiers provided by Handles.

    The Typed Object Model (TOM)
    The Typed Object Model allows us to describe the structure and behavior of a wide variety of data formats and information services. The system was originally developed at Carnegie Mellon, where it still drives a popular web-based conversion service. At the Penn Library, we have released the core TOM software as open-source. We plan to use it to document our own data formats and services, assist in data format migration and other conversions, and provide uniform application-level interfaces to heterogeneous data services.

    II. Projects and Programs

    Citation Linking
    We are embarking on a citation linking project that aims to let readers find literature cited in scholarly works by clicking on citations in the document's bibliography and footnotes. By using citations directly, instead of relying on the opaque document identifiers assigned by some other reference linking systems, we hope to develop tools that cover a wider range of scholarly literature, and give users more options in finding cited works and related resources. As part of the project, we hope to implement a context-sensitive system that automatically identifies and parses citations in ordinary digital monographs, and embeds links to services that can then return digital documents, catalog records, or other resources related to the citation. (We may use third-party software for some of these components.) We believe that a powerful citation service may make research signficantly more productive.

    Communities of Interest Initiative
    Communities of interest, built around scholarly disciplines or interdisciplinary collaboration, can form important focuses for library development. We have defined a set of communities of interest at Penn, and are now using them to develop services that allow users to locate resources relevant to particular communities of interest. (For example, our new materials and electronic journals services allow searching for materials of interest to specific communities.) We also are considering developing services to help people in these communities collaborate, and select especially relevant resources.

    Digital Images
    The digital images project is a system to manage and deliver digital images for use in teaching and research. It uses enhanced MARC records, XML tools for managing collection metadata, software to browse and search image collections, and a flexible delivery system that allows viewing all, or selected parts, of images at various resolutions and detail. (Parts of this project are supported by third party software such as MrSID from LizardTech.) The system is being implemented initially for a Fine Arts slide collection, and we hope to use it for other types of collections as well.

    English Renaissance in Context (ERIC)
    English Renaissance in Context (ERIC) ERIC is a three-year, NEH funded project to create a web site presenting ways in which Shakespeare's plays can be taught using digital facsimiles of original sources and documents. It is a collaborative project involving the School of Arts and Sciences (SAS) at Penn and the Penn Library's Schoenberg Center for Electronic Text & Image (SCETI). It has two distinct components: a set of self-paced tutorials that raise a variety of issues for students, and an introduction to the printing and publishing context of the English Renaissance. ERIC is part of a larger collaborative effort between SAS and the Library to create a major archive of digital facsimiles relating to the English Renaissance, one of the areas of particular strength both among the faculty and in the Library. A completed prototype will be available by the summer, 2001.
    http://www.library.upenn.edu/etext/collections/furness/eric (Flash player required)

    Franklin to Web
    The Franklin to Web project is standardizing metadata for many of our digital resources, encoding the metadata as MARC records that are filed in our Franklin catalog, and making the metadata searchable and browsable via the Web. Although encoded as MARC, the scope of our metadata goes well beyond ordinary catalog records, and is much more easily searched and filtered than static Web page listings. We are starting by migrating electronic journal and database records. (See the services section above for more details.) We will also eventually incorporate digital slide images and selected Web sites as well.

    The Information Base
    MESL allows members of the Cornell community to view a collection of nearly 9,000 images selected from seven museums and institutions across the United States.

    Museum On-Line Project
    The Information Base project is designing interrelated repositories of digital documents and metadata on a common set of principles, supporting a wide range of information formats, use of information in multiple contexts, and long-term preservation. There are several notable aspects to the design:
    • A "life cycle" model that describes how digital resources can be managed from their initial acquisition, preserved for long-term access, and evaluated.
    • A centralized network storage facility (hosted by a recently acquired terabyte-scale disk array) which simplifies access, backup, and integrity checking.
    • Standards for processes, data formats and metadata that reflect the best practices of digital libraries, and that we adapt and extend for our local environment in projects such as Franklin-to-Web.

    Selective Dissemination of Information (SDI)
    We are starting to plan and propose services for selective dissemination of information. We mean to provide views of Library resources that are customized for particular users and groups, and subscription services for informing users of new information they may be interested in. Early aspects of the project may include bundled, customized versions of our new materials service (see above) and journal contents and abstract services.

    Visualization of Information
    One difficulty in finding electronic resources is that plain-text browser windows only let users examine a small amount of information at any given time. We can see much more at one time through nondigital technologies, such as topically-arranged open stacks. Now digital technologies are being developed to make large quantities of digital resources intelligibly browsable, by transforming indexes of metadata into more intuitive graphical designs and layouts. We are considering partnerships with selected developers of these technologies, to test them in our own library services, and determine the most effective uses of this technology. At the same time, we are investigating ways of making our existing text-based indexes more effective in letting users browse, discover, and view our resources.

    III. Specific Digital Library Challenges

    Further populating our information base, so our digital resources and metadata can be more easily searched and presented in different ways.

    Transitioning from fragile URLs to more robust persistent identifiers where possible.

    Designing and experimenting with archives for long-term preservation of digital documents.

    Implementing services for selective dissemination of information, developing experimental citation linking services, and improving cross-collection search services.

    Implementing searchable, archival-quality collections of digital images.

    Supporting geographic information and related datasets.

    Strengthening the open-source library software community, by releasing software we develop, working with other developers to improve their software, and publicizing the benefits of open-source library initiatives.

    Please send comments or suggestions.
    Last updated:
    © 2000 Council on Library and Information Resources

    CLIR Issue Table of Contents
    Newsletter Index