Columbia University
Report to the Digital Library Federation
July 7, 2001
A. Collections
Advanced Papyrological Information System (APIS)
Phase II of the Advanced Papyrological Information System (APIS) was
completed in March 2001. The searchable APIS database now includes merged
metadata records for papyri and ostraca in the six major U.S. university
collections (Columbia, Berkeley, Duke, Michigan, Princeton and Yale) and
provides links to more than 5600 unique, high quality digital images of
documentary and literary papyri. The APIS online system provides
real-time SQL database searching via the Web and includes translations,
browsable listings of subjects and genre types and links to corresponding
transliterations in the Duke Databank of Documentary Papyri (at Project
Perseus). In April, NEH awarded another $300,000 grant for
a Phase III of the project, which will focus on gathering and merging
metadata and image links from additional U.S. university collections into
the APIS database.
Home Page — APIS
Overview — APIS
Project Documentation
Digital Scriptorium
Overlapping and extending the initial Mellon grant (which ran through
December 2000), Digital Scriptorium is now funded by a grant from NEH
(May 1999-June 2002, counting the third-year no-cost extension). The
present number of participating institutions stands at twelve, although
only those under the original Mellon grant have bibliographic/photographic
data available on the web at this point.
Currently pending is an application for funds from the Delmas Foundation
so that we can develop higher quality searching and user interface;
this work will be based on the almost completed dtd for description
of med/ren manuscripts, compiled under the aegis of the TEI.
In February, Luna Imaging released a one-year test of 500 DS images
in Luna's Insight software; access is via the project's website. Traditional
growth of content, on a library-by-library basis, continues, as two
major East Coast universities have informally committed to joining DS
via a shared NEH proposal in June 2002.
Digital Scriptorium
Public Site — Luna
Insight DS Subset
Digital South Asia Library
Columbia participates as a lead institution in the Digital South Asia
Library (DSAL), a collaborative endeavor sponsored by the Center for Research
Libraries with participation from leading U.S. universities, the Library
of Congress, the Asia Society, the British Library, Oxford, Cambridge
and a number of institutions in South Asia. Funding for 1999-2002 was
provided by the U.S Department of Education under their grants for Technical
Innovation & Cooperation for Foreign Information Access, the Dharam
Hinduja Indic Research Center, and other educational foundations. DSAL
has created digital versions of South Asia print materials in the following
categories: Reference Resources (dictionaries, pedagogical grammars),
Images (historical and architectural photo archives — about 300,000
digital images to date), Maps, Statistics (historical statistical compilations,
in e-book and spreadsheet formats), Bibliographic databases (catalogs
to two major collections in India, article indexing, world union lists
of newspaper holdings), and selected E-book and E-Journals.
Columbia's ongoing DSAL collaboration with the University of Chicago
and the Triangle South Asia Consortium (NC) to create and disseminate
online dictionaries for twenty-six modern literary languages of South
Asia continues on schedule. Full digital data has been created
(through double-keying and automated error detection) for 12 dictionaries
so far, with another 5 in process right now. Negotations for digital
publishing rights for the remaining titles identified by the selection
committee are going forward now. A powerful user interface for
browsing, searching, and viewing dictionary entries has been devised,
and is in production mode on the website right now. The interface
for searching image metadata has also been completed.
DSAL Home page — Digital
Dictionaries of South Asia
Greene & Greene Virtual Archive
The "Greene & Greene Virtual Archive" — a joint
project of The Gamble House/USC, Berkeley and Columbia and funded by the
Getty Foundation — has has its goal the creation of a scholarly web
site presenting architectural drawings and photographs by Charles and
Henry Greene, American Arts and Crafts movement architects working in
the early 20th century. Cataloging (in the form of archival finding
aids) and images are being contributed by Columbia, Berkeley, and the
Huntington Library; Berkeley is providing technical support for the metadata
compilation and web presentation. All of Columbia's cataloging has
been completed and digital imaging of the approximately 5000 slides, photographs
and rotogravures will begin in the next few months. Images will
be stored at and served from local institutions; MrSid compression technology
will provide flexible image navigation and study potential.
At Columbia this project is seen as an operational test of the more
comprehensive larger "Digital Aviador" project which will
capture and present a comprehensive set of architectural drawings and
photographs from the collections of Avery Library. "Digital Aviador"
is the successor project to the original Avery/RLG Aviador videodisc
project completed in 1992.
CU Greene
& Greene Project Working Page
John Jay Papers
The two-year "Papers of John Jay" project, funded by a $150,00
grant from NEH and supplementary contributions from the Florence Gould
Foundation and the Columbia Libraries' Friends fund, will provide an
online index to all known documents written by or to John Jay, 1745-1829,
one of America's 'founding fathers,' a distinguished statesmen and a
graduate of Columbia, then King's College. The project includes
the creation of a searchable database of descriptive metadata —
including abstracts — at the item level for all of the approximately
13,400 unique letters, memos, diaries, etc. in the collection. Some
100,00 page images will be scanned as 8-bit grayscale TIFFs at 300 dpi
and linked to the metadata records.
Jay Papers Project Working Page
B. Services
Virtual Reading Room Project
The Virtual Reading Room Project is a three-year endeavor to build
a technologically enhanced teaching and learning environment that
will support undergraduate education at Columbia. Faculty-selected
texts, music, images and other material included in the syllabi of
the Columbia's "Core Curriculum" courses will be converted
to a digitized format, and installed on the Columbia server.
Currently bids are being solicited from vendors to perform text conversion
and TEI/XML markup for approximately twenty basic texts (e.g., Plato,
Machiavelli, Nietzsche, Freud). Licenses and permissions to
use these texts have been obtained from the publishers. The
plan is to store these texts as XML and serve them as HTML using XSLT
Reading Room Project Page
Columbia Libraries is testing two options for future use in providing
access to reserves materials electronically. Beginning Fall 2001
users can go to the Columbia Course Directory, identify a specific course,
click on the Reserves option and access materials that are either linked
to products like ProQuest, JSTOR, etc., or are scanned PDF files of
course chapters, articles, etc. This option is supported by enhancements
to an existing reserves processing system including using Adobe Acrobat's
Capture and the extension of Columbia's Name Resolver application to
create and manage stable article-level URLs.
Prometheus, a course management system with a electronic reserves component,
is being tested in two libraries on the Morningside campus. The
current plan is for a broader campus implementation by January 2002.
C. Systems
EJournals Metadata Project
As part of our overall effort to implement a database-driven approach
to building the Library's web interface to electronic resources,Columbia's
Ejournals Metadata Project has now implemented browsable lists of ejournals
on our LibraryWeb site which are generated weekly from CLIO (Columbia's
LMS). For ejournals not cataloged in CLIO, the Project will in the
next phase provide a mechanism for distributed, selector/reference librarian
input of preliminary or minimal-level ejournal metadata which will then
be integrated and displayed as part of the catalog-derived ejournal listings.
The Project's goal is to fully automate the publication to the Web
of these listings — initially, by title, publisher, society/sponsor
and subject category, in the next phase, by geographic and other content.
The Project also addresses the related task of creating "permanent
URLs" for ejournals in Columbia's supported e-collections and provides
an improved workflow for the systematic proxying of individual ejournal
titles. When the Ejournals Metadata Project is complete, the same
approach will be extended to other eresource formats such as databases,
e-texts, etc.
The application is implemented using our Master Metadata
File, the CUL Hierarchical Interface to
LC Classification, and our new Name Resolver
application. Java and cgi scripting are used for MARC to MMF conversion
and HTML page generation. In a later phase, the current batch
web page creation will be supplemented or replaced by real-time database
retrieval and display.
Ejournals Home Page — Ejournals
Project Documentation
Master Metadata File (MMF)
Columbia's Master Metadata File (MMF) — a locally-developed metadata
repository using IBM's DB2 product and built around a MARC-based relational
database schema — currently holds about 25,000 bibliographic and
structural metadata for digital collections held locally or accessed
remotely. The schema was designed to be able to represent multiple
versions, collections, aggregations such as pages in a book, and hierarchies
of digital objects. Information may be imported and exported in several
formats. The database also may be used as an intermediate architectural
component and may be queried interactively.
Within the last few months the MMF has been brought into full production
for both the APIS
Project and the Ejournals Metadata Project.
In the first case metadata was obtained, converted and loaded from partner
institutions' local cataloging, creating a composite ("union")
database at Columbia for papyrological materials which can be queried
in real time. In the second case, MARC cataloging data for ejournals
is extracted from our local LMS (CLIO) on an ongoing basis, converted
and loaded into the MMF. The next phase of this key strategic
project includes: a) development of robust and scalable real-time applications
running against the MMF; b) extension of the underlying schema to include
administrative metadata in order to build applications to support acquisition
and management of e-resources. We expect also to look at the new
METS structural metadata standard and evaluate the role it might play
in our environment.
Metadata Project Documentation
Hierarchical Interface to LC Classification
Columbia's Hierarchical Interface to LC Classification (HILCC) project
is intended to test the potential of using the LC Classification numbers
provided in standard catalog records to generate a structured menuing
system for subject access on the web. The HILCC mapping table
— being jointly developed by CUL systems, cataloging and reference
staff — associates each LC classification range with vocabulary
in a three-level subject tree, for example:
LC Range: |
GC 1 - GC 1582 |
Maps to: |
Sciences — Earth & Environmental Sciences
— Oceanography |
Call numbers from catalog records extracted from CLIO (Columbia's
LMS) are matched against the HILCC mapping table, and a browsable subject
category tree is generated on the web to guide users through eresource
subject content. HILCC has been used now for two web-based services
at Columbia: CLIO Notify and the Browsable Ejournal Subject Listings.
The next phase of this project will entail: a) testing and evaluation;
b) revising and extending current HILCC mapping; c) developing a strategy
for including interdisciplinary resources and other areas not easily
mapped to LC Classification; d) comparison with colleague institutions'
web-based subject taxonomies.
Documentation — Browsable
Ejournal Subject Listings
NetLibrary Integration
In late March, 27,364 MARC records for netLibrary books were loaded
into the Libraries' online catalog, CLIO. Under a consortial arrangement
including Cornell, Dartmouth, and Middlebury College, the full contents
of netLibrary are available to users at all four institutions. As
soon as a book has been checked out twice by users at any of the four,
it is purchased for the consortium, with funds coming from a deposit
account. This arrangement will be reevaluated each year, as we
gain more experience with use patterns. Having the records in
CLIO showed immediate results in dramatically increased use. Records
have been loaded in two sets, with separate markers for those books
that have been purchased by the consortium and those that are only available
for purchase through use. These markers will allow us to manage
future access should the terms of the arrangement change
Name Resolver/URN Implementation
In Fall 2000 the Libraries implemented a name resolver application
which is now used for access to both licensed and free electronic resources
in our digital collections. The name resolver allows the creation
of persistent URLs (URNs) for display in the library's OPAC and in Web
presentations of electronic resources. Maintenance and updating
of actual e-resource URLs & addresses are now handled by library
systems staff in special directory tables referenced by the name resolution
scripts, rather than in the library's catalog.
As part of the name resolver's implementation, a one time global update
was done against the online catalog in order to add local 956 fields
(parallel to MARC 856 fields) containing URNs to all existing catalog
records for eresources. New and changed catalog records are now
identified on a weekly schedule and updated with 956 fields containing
URNs. The same process generates a data feed used to update
the actual name resolver tables and to flag resources that require "proxying"
and other types of authorization-related maintenance. A recent
enhancement to the system allows for real-time ad-hoc assignment of
URNs by non-library systems staff, e.g., when placing newly scanned
articles and folders on electronic reserve. Log records are created
for all name resolver transactions and will be used as a way of gathering
more complete & consistent statistics on the use of licensed commercial
web resources by the Columbia community.
Resolver Documentation
XML/Application servers
Over the last year Columbia's Academic Computing group has implemented
a set of tools for the development of XML and JSP based applications.
Applications currently have direct access to DB2 databases, and
are being built as front-ends to the Master Metadata File (see
above), to "portal" frameworks, such as the JA-SIG portal,
and to instructional management platforms.
Electronic Publishing Initiative at Columbia
Columbia Pubscape: A
Core Integration System for a National Science Digital Library Publishing
Center. Under the auspices of the Electronic Publishing Initiative
at Columbia (EPIC), a university-based organization involving Columbia
University Press, the Libraries, and the Academic Information Systems
computing center, we will create mechanisms for the development,
implementation, and sustainability of innovative, cost-efficient,
and high quality digital library resources designed for the enhancement
of teaching and learning in science. Building on our existing
infrastructure for creating award winning online publications, Columbia
University will develop models for rights management sustainability
along with a scalable, interoperable technology framework necessary
for a Core Integration System (CIS) for the National Science Digital
Library (NSDL). Expected outcomes of the project include the
development of a set of organizational and operational models for
the successful implementation and long-term sustainability of the
Key features of the project are: 1) creation of an intellectual
property and rights management system; 2) creation of business models,
license agreements, and sustainability mechanisms and 3) development
of a scalable, interoperable technology framework.
Online Publishing Use and Costs Evaluation
EPIC has received support for a three-year program to assess
the costs and use of electronic publishing projects of the kind
now being undertaken by the Center and many other universities and
publishers. While the online scholarly publications developed at
Columbia and elsewhere have yielded valuable information concerning
the best technologies, tools and processes for creating these resources,
less has been done to analyze their continuing costs and ongoing
value to users. The Online Publishing Use and Costs Evaluation
Programwill gather and analyze data to help answer
the important questions that remain, namely: how electronic
publishing projects affect the cost of scholarly communications
process as a whole throughout the life cycle of the publication;
how the continued use of these electronic publications affect the
research and teaching patterns of scholars and students both qualitatively
and quantitatively; and which financial models will allow for sustainability
of these products over the long term. This program will contribute
to the development of a generalized model for evaluation of online
publications and propose strategies for integrating evaluation,
analysis and learning as ongoing functions of the online publishing
EPIC Home Page —
Electronic Text Service (ETS)
As a public service unit, the Electronic Text Service has continued
to focus its primary effort on developing and providing access to
the libraries' digital text holdings in the humanities and history,
maintaining areas of traditional strength, such as Classics, Medieval
studies, English and American literature, the history of philosophy,
religion, and American history, while making good progress toward
bringing coverage in French, German, Italian, and Spanish up to comparable
levels. The past year has also seen significant growth in our holdings
of contemporary multimedia and hypertext fiction, selected in consultation
with a member of the English department doing research in this field.
ETS has also contributed to Columbia's digital library tools and
infrastructure by such activities as: preparing TEI/XML markup specifications
for our Virtual Reading Room project; helping develop local expertise
in the use of XSLT; creating the capacity for OCR'ing of microform
text resources; and by taking a lead role in assisting members of
the Columbia community wishing to use personal bibliographic software
programs such as ProCite or EndNote for Z39.50 searching & downloading
of citations from digital library databases.
ETS Home Page
Electronic Data Service (EDS)
EDS is the University's numerical data library and support center
for social science computing. The EDS DataGate, perhaps the
first Web finding tool in the field (1994), now provides direct, Web-enabled
download of dataset files, along with all available documentation
and programs. Under the leadership of Jane Weintrop, the new
Data Resources Librarian, delivery and preparations for supporting
the 2000 Census, in all its permutations, are underway. Nearly
50 gigabytes of data (compressed) is already available online for
use or download, with many more gigabytes expected from the Census.
The development of new service models to support the world of
direct data access twenty-four hours a day will be a focus of EDS
efforts in the future. In the coming year, EDS will also host
an NSF-funded usability test on interfaces to government data and
examine extended data delivery systems.
EDS Home Page
The BorrowDirect Project—a collaborative effort by Columbia University,
the University of Pennsylvania, Yale University, and RLG—will
move to a new phase in July 2001. Originally called the
"CoPY Project," BorrowDirect's objective was to test whether
using specified "lenders of first resort" could substantially
lower the unit cost of traditional interlibrary loan while maintaining
high service standards. With more than 4,000 requests filled
during the first 18 months, the project demonstrated a cost-effective
alternative to traditional ILL, with better service for its users.
With the end of the pilot phase, the three university libraries will
take over project coordination from RLG and work collaborative to
extend and improve the service.
BorrowDirect allows patrons to access a "virtual catalog"
that combines the collections of Columbia, Penn, and Yale. Users
initiate a request for a known citation that is searched across the
three catalogs using Z39.50 broadcast searching. A customized
software package lets user requests bypass the borrowing library's
ILL office and go directly to the potential lending location. Links
to each library's data files authenticate the patron, check the circulation
status, and determine the availability of the requested item. Automatic
e-mail to the patron confirms the request and additional e-mails are
sent as the request progresses. The libraries use a commercial
overnight delivery service to enable a four-working-day turnaround
from the initial request to notification of pickup availability.

Last updated: Monday August 13 2001
© 2000 Council on Library and Information Resources