Digital Library Federation. Newsletter 7/1/01. Columbia University Report

Printer-Friendly Page

Columbia University
Report to the Digital Library Federation
July 7, 2001

TABLE OF CONTENTS

Collections, Services, and Systems
Projects and Programs

I. Collections, Services, and Systems

A. Collections

Advanced Papyrological Information System (APIS)
Phase II of the Advanced Papyrological Information System (APIS) was completed in March 2001. The searchable APIS database now includes merged metadata records for papyri and ostraca in the six major U.S. university collections (Columbia, Berkeley, Duke, Michigan, Princeton and Yale) and provides links to more than 5600 unique, high quality digital images of documentary and literary papyri. The APIS online system provides real-time SQL database searching via the Web and includes translations, browsable listings of subjects and genre types and links to corresponding transliterations in the Duke Databank of Documentary Papyri (at Project Perseus). In April, NEH awarded another $300,000 grant for a Phase III of the project, which will focus on gathering and merging metadata and image links from additional U.S. university collections into the APIS database.
APIS Home Page — APIS Overview — APIS Project Documentation

Digital Scriptorium
Overlapping and extending the initial Mellon grant (which ran through December 2000), Digital Scriptorium is now funded by a grant from NEH (May 1999-June 2002, counting the third-year no-cost extension). The present number of participating institutions stands at twelve, although only those under the original Mellon grant have bibliographic/photographic data available on the web at this point.
Currently pending is an application for funds from the Delmas Foundation so that we can develop higher quality searching and user interface; this work will be based on the almost completed dtd for description of med/ren manuscripts, compiled under the aegis of the TEI.

In February, Luna Imaging released a one-year test of 500 DS images in Luna's Insight software; access is via the project's website. Traditional growth of content, on a library-by-library basis, continues, as two major East Coast universities have informally committed to joining DS via a shared NEH proposal in June 2002.
Digital Scriptorium Public Site — Luna Insight DS Subset

Digital South Asia Library
Columbia participates as a lead institution in the Digital South Asia Library (DSAL), a collaborative endeavor sponsored by the Center for Research Libraries with participation from leading U.S. universities, the Library of Congress, the Asia Society, the British Library, Oxford, Cambridge and a number of institutions in South Asia. Funding for 1999-2002 was provided by the U.S Department of Education under their grants for Technical Innovation & Cooperation for Foreign Information Access, the Dharam Hinduja Indic Research Center, and other educational foundations. DSAL has created digital versions of South Asia print materials in the following categories: Reference Resources (dictionaries, pedagogical grammars), Images (historical and architectural photo archives — about 300,000 digital images to date), Maps, Statistics (historical statistical compilations, in e-book and spreadsheet formats), Bibliographic databases (catalogs to two major collections in India, article indexing, world union lists of newspaper holdings), and selected E-book and E-Journals.
Columbia's ongoing DSAL collaboration with the University of Chicago and the Triangle South Asia Consortium (NC) to create and disseminate online dictionaries for twenty-six modern literary languages of South Asia continues on schedule. Full digital data has been created (through double-keying and automated error detection) for 12 dictionaries so far, with another 5 in process right now. Negotations for digital publishing rights for the remaining titles identified by the selection committee are going forward now. A powerful user interface for browsing, searching, and viewing dictionary entries has been devised, and is in production mode on the website right now. The interface for searching image metadata has also been completed.
DSAL Home page — Digital Dictionaries of South Asia
Greene & Greene Virtual Archive
The "Greene & Greene Virtual Archive" — a joint project of The Gamble House/USC, Berkeley and Columbia and funded by the Getty Foundation — has has its goal the creation of a scholarly web site presenting architectural drawings and photographs by Charles and Henry Greene, American Arts and Crafts movement architects working in the early 20th century. Cataloging (in the form of archival finding aids) and images are being contributed by Columbia, Berkeley, and the Huntington Library; Berkeley is providing technical support for the metadata compilation and web presentation. All of Columbia's cataloging has been completed and digital imaging of the approximately 5000 slides, photographs and rotogravures will begin in the next few months. Images will be stored at and served from local institutions; MrSid compression technology will provide flexible image navigation and study potential.
At Columbia this project is seen as an operational test of the more comprehensive larger "Digital Aviador" project which will capture and present a comprehensive set of architectural drawings and photographs from the collections of Avery Library. "Digital Aviador" is the successor project to the original Avery/RLG Aviador videodisc project completed in 1992.
CU Greene & Greene Project Working Page

John Jay Papers
The two-year "Papers of John Jay" project, funded by a $150,00 grant from NEH and supplementary contributions from the Florence Gould Foundation and the Columbia Libraries' Friends fund, will provide an online index to all known documents written by or to John Jay, 1745-1829, one of America's 'founding fathers,' a distinguished statesmen and a graduate of Columbia, then King's College. The project includes the creation of a searchable database of descriptive metadata — including abstracts — at the item level for all of the approximately 13,400 unique letters, memos, diaries, etc. in the collection. Some 100,00 page images will be scanned as 8-bit grayscale TIFFs at 300 dpi and linked to the metadata records.
CU Jay Papers Project Working Page

B. Services

Virtual Reading Room Project
The Virtual Reading Room Project is a three-year endeavor to build a technologically enhanced teaching and learning environment that will support undergraduate education at Columbia. Faculty-selected texts, music, images and other material included in the syllabi of the Columbia's "Core Curriculum" courses will be converted to a digitized format, and installed on the Columbia server.

Currently bids are being solicited from vendors to perform text conversion and TEI/XML markup for approximately twenty basic texts (e.g., Plato, Machiavelli, Nietzsche, Freud). Licenses and permissions to use these texts have been obtained from the publishers. The plan is to store these texts as XML and serve them as HTML using XSLT scripting.
Virtual Reading Room Project Page

E-Reserves
Columbia Libraries is testing two options for future use in providing access to reserves materials electronically. Beginning Fall 2001 users can go to the Columbia Course Directory, identify a specific course, click on the Reserves option and access materials that are either linked to products like ProQuest, JSTOR, etc., or are scanned PDF files of course chapters, articles, etc. This option is supported by enhancements to an existing reserves processing system including using Adobe Acrobat's Capture and the extension of Columbia's Name Resolver application to create and manage stable article-level URLs.

Prometheus, a course management system with a electronic reserves component, is being tested in two libraries on the Morningside campus. The current plan is for a broader campus implementation by January 2002.

C. Systems

EJournals Metadata Project
As part of our overall effort to implement a database-driven approach to building the Library's web interface to electronic resources,Columbia's Ejournals Metadata Project has now implemented browsable lists of ejournals on our LibraryWeb site which are generated weekly from CLIO (Columbia's LMS). For ejournals not cataloged in CLIO, the Project will in the next phase provide a mechanism for distributed, selector/reference librarian input of preliminary or minimal-level ejournal metadata which will then be integrated and displayed as part of the catalog-derived ejournal listings. The Project's goal is to fully automate the publication to the Web of these listings — initially, by title, publisher, society/sponsor and subject category, in the next phase, by geographic and other content. The Project also addresses the related task of creating "permanent URLs" for ejournals in Columbia's supported e-collections and provides an improved workflow for the systematic proxying of individual ejournal titles. When the Ejournals Metadata Project is complete, the same approach will be extended to other eresource formats such as databases, e-texts, etc.
The application is implemented using our Master Metadata File, the CUL Hierarchical Interface to LC Classification, and our new Name Resolver application. Java and cgi scripting are used for MARC to MMF conversion and HTML page generation. In a later phase, the current batch web page creation will be supplemented or replaced by real-time database retrieval and display.
CUL Ejournals Home Page — Ejournals Project Documentation

Master Metadata File (MMF)
Columbia's Master Metadata File (MMF) — a locally-developed metadata repository using IBM's DB2 product and built around a MARC-based relational database schema — currently holds about 25,000 bibliographic and structural metadata for digital collections held locally or accessed remotely. The schema was designed to be able to represent multiple versions, collections, aggregations such as pages in a book, and hierarchies of digital objects. Information may be imported and exported in several formats. The database also may be used as an intermediate architectural component and may be queried interactively.

Within the last few months the MMF has been brought into full production for both the APIS Project and the Ejournals Metadata Project. In the first case metadata was obtained, converted and loaded from partner institutions' local cataloging, creating a composite ("union") database at Columbia for papyrological materials which can be queried in real time. In the second case, MARC cataloging data for ejournals is extracted from our local LMS (CLIO) on an ongoing basis, converted and loaded into the MMF. The next phase of this key strategic project includes: a) development of robust and scalable real-time applications running against the MMF; b) extension of the underlying schema to include administrative metadata in order to build applications to support acquisition and management of e-resources. We expect also to look at the new METS structural metadata standard and evaluate the role it might play in our environment.
Metadata Project Documentation

Hierarchical Interface to LC Classification (HILCC)
Columbia's Hierarchical Interface to LC Classification (HILCC) project is intended to test the potential of using the LC Classification numbers provided in standard catalog records to generate a structured menuing system for subject access on the web. The HILCC mapping table — being jointly developed by CUL systems, cataloging and reference staff — associates each LC classification range with vocabulary in a three-level subject tree, for example:

LC Range: GC 1 - GC 1582

Maps to: Sciences — Earth & Environmental Sciences — Oceanography

Call numbers from catalog records extracted from CLIO (Columbia's LMS) are matched against the HILCC mapping table, and a browsable subject category tree is generated on the web to guide users through eresource subject content. HILCC has been used now for two web-based services at Columbia: CLIO Notify and the Browsable Ejournal Subject Listings. The next phase of this project will entail: a) testing and evaluation; b) revising and extending current HILCC mapping; c) developing a strategy for including interdisciplinary resources and other areas not easily mapped to LC Classification; d) comparison with colleague institutions' web-based subject taxonomies.
HILCC Project Documentation — Browsable Ejournal Subject Listings

NetLibrary Integration
In late March, 27,364 MARC records for netLibrary books were loaded into the Libraries' online catalog, CLIO. Under a consortial arrangement including Cornell, Dartmouth, and Middlebury College, the full contents of netLibrary are available to users at all four institutions. As soon as a book has been checked out twice by users at any of the four, it is purchased for the consortium, with funds coming from a deposit account. This arrangement will be reevaluated each year, as we gain more experience with use patterns. Having the records in CLIO showed immediate results in dramatically increased use. Records have been loaded in two sets, with separate markers for those books that have been purchased by the consortium and those that are only available for purchase through use. These markers will allow us to manage future access should the terms of the arrangement change

Name Resolver/URN Implementation
In Fall 2000 the Libraries implemented a name resolver application which is now used for access to both licensed and free electronic resources in our digital collections. The name resolver allows the creation of persistent URLs (URNs) for display in the library's OPAC and in Web presentations of electronic resources. Maintenance and updating of actual e-resource URLs & addresses are now handled by library systems staff in special directory tables referenced by the name resolution scripts, rather than in the library's catalog.

As part of the name resolver's implementation, a one time global update was done against the online catalog in order to add local 956 fields (parallel to MARC 856 fields) containing URNs to all existing catalog records for eresources. New and changed catalog records are now identified on a weekly schedule and updated with 956 fields containing URNs. The same process generates a data feed used to update the actual name resolver tables and to flag resources that require "proxying" and other types of authorization-related maintenance. A recent enhancement to the system allows for real-time ad-hoc assignment of URNs by non-library systems staff, e.g., when placing newly scanned articles and folders on electronic reserve. Log records are created for all name resolver transactions and will be used as a way of gathering more complete & consistent statistics on the use of licensed commercial web resources by the Columbia community.
Name Resolver Documentation

XML/Application servers
Over the last year Columbia's Academic Computing group has implemented a set of tools for the development of XML and JSP based applications. Applications currently have direct access to DB2 databases, and are being built as front-ends to the Master Metadata File (see above), to "portal" frameworks, such as the JA-SIG portal, and to instructional management platforms.

II. Projects and Programs

Electronic Publishing Initiative at Columbia (EPIC)

Columbia Pubscape: A Core Integration System for a National Science Digital Library Publishing Center. Under the auspices of the Electronic Publishing Initiative at Columbia (EPIC), a university-based organization involving Columbia University Press, the Libraries, and the Academic Information Systems computing center, we will create mechanisms for the development, implementation, and sustainability of innovative, cost-efficient, and high quality digital library resources designed for the enhancement of teaching and learning in science. Building on our existing infrastructure for creating award winning online publications, Columbia University will develop models for rights management sustainability along with a scalable, interoperable technology framework necessary for a Core Integration System (CIS) for the National Science Digital Library (NSDL). Expected outcomes of the project include the development of a set of organizational and operational models for the successful implementation and long-term sustainability of the NSDL.

Key features of the project are: 1) creation of an intellectual property and rights management system; 2) creation of business models, license agreements, and sustainability mechanisms and 3) development of a scalable, interoperable technology framework.

Online Publishing Use and Costs Evaluation Program
EPIC has received support for a three-year program to assess the costs and use of electronic publishing projects of the kind now being undertaken by the Center and many other universities and publishers. While the online scholarly publications developed at Columbia and elsewhere have yielded valuable information concerning the best technologies, tools and processes for creating these resources, less has been done to analyze their continuing costs and ongoing value to users. The Online Publishing Use and Costs Evaluation Programwill gather and analyze data to help answer the important questions that remain, namely: how electronic publishing projects affect the cost of scholarly communications process as a whole throughout the life cycle of the publication; how the continued use of these electronic publications affect the research and teaching patterns of scholars and students both qualitatively and quantitatively; and which financial models will allow for sustainability of these products over the long term. This program will contribute to the development of a generalized model for evaluation of online publications and propose strategies for integrating evaluation, analysis and learning as ongoing functions of the online publishing process.
EPIC Home Page — Pubscape

Electronic Text Service (ETS)
As a public service unit, the Electronic Text Service has continued to focus its primary effort on developing and providing access to the libraries' digital text holdings in the humanities and history, maintaining areas of traditional strength, such as Classics, Medieval studies, English and American literature, the history of philosophy, religion, and American history, while making good progress toward bringing coverage in French, German, Italian, and Spanish up to comparable levels. The past year has also seen significant growth in our holdings of contemporary multimedia and hypertext fiction, selected in consultation with a member of the English department doing research in this field.
ETS has also contributed to Columbia's digital library tools and infrastructure by such activities as: preparing TEI/XML markup specifications for our Virtual Reading Room project; helping develop local expertise in the use of XSLT; creating the capacity for OCR'ing of microform text resources; and by taking a lead role in assisting members of the Columbia community wishing to use personal bibliographic software programs such as ProCite or EndNote for Z39.50 searching & downloading of citations from digital library databases.
ETS Home Page

Electronic Data Service (EDS)
EDS is the University's numerical data library and support center for social science computing. The EDS DataGate, perhaps the first Web finding tool in the field (1994), now provides direct, Web-enabled download of dataset files, along with all available documentation and programs. Under the leadership of Jane Weintrop, the new Data Resources Librarian, delivery and preparations for supporting the 2000 Census, in all its permutations, are underway. Nearly 50 gigabytes of data (compressed) is already available online for use or download, with many more gigabytes expected from the Census. The development of new service models to support the world of direct data access twenty-four hours a day will be a focus of EDS efforts in the future. In the coming year, EDS will also host an NSF-funded usability test on interfaces to government data and examine extended data delivery systems.
EDS Home Page — EDS Datagate

BorrowDirect
The BorrowDirect Project—a collaborative effort by Columbia University, the University of Pennsylvania, Yale University, and RLG—will move to a new phase in July 2001. Originally called the "CoPY Project," BorrowDirect's objective was to test whether using specified "lenders of first resort" could substantially lower the unit cost of traditional interlibrary loan while maintaining high service standards. With more than 4,000 requests filled during the first 18 months, the project demonstrated a cost-effective alternative to traditional ILL, with better service for its users. With the end of the pilot phase, the three university libraries will take over project coordination from RLG and work collaborative to extend and improve the service.
BorrowDirect allows patrons to access a "virtual catalog" that combines the collections of Columbia, Penn, and Yale. Users initiate a request for a known citation that is searched across the three catalogs using Z39.50 broadcast searching. A customized software package lets user requests bypass the borrowing library's ILL office and go directly to the potential lending location. Links to each library's data files authenticate the patron, check the circulation status, and determine the availability of the requested item. Automatic e-mail to the patron confirms the request and additional e-mails are sent as the request progresses. The libraries use a commercial overnight delivery service to enable a four-working-day turnaround from the initial request to notification of pickup availability.

Please send comments or suggestions.
Last updated: Monday August 13 2001
© 2000 Council on Library and Information Resources

CLIR Home Page

Columbia University Report to the Digital Library Federation July 7, 2001

I. Collections, Services, and Systems

A. Collections

B. Services

C. Systems

II. Projects and Programs

Electronic Publishing Initiative at Columbia (EPIC)

Columbia University
Report to the Digital Library Federation
July 7, 2001