DIGITAL LIBRARY FEDERATION
FALL FORUM 2005
CHARLOTTESVILLE, VA
NOVEMBER 7—9, 2005
Omni Charlottesville Hotel
245 West Main Street
Charlottesville, VA 22902
(434) 971-5500
Floor
Plan
PRECONFERENCE: SUNDAY,
NOVEMBER 6
9:00am—4:00pm American West Partners Meeting—for project
participants
(Salon C, Lobby Level)
PRECONFERENCE: MONDAY, NOVEMBER 7
9:00am—11:30am NDIIPP Technical Architecture Affinity Group
Meeting—
for project participants (Lewis/Clark, Lobby Level)
8:30am—12:30pm DLF Aquifer Meeting—for project
participants (James Monroe, Lobby Level)
DAY ONE: MONDAY, NOVEMBER 7
10:30am—1:00pm Registration (Prefunction Area, Lobby Level)
11:30am—12:15pm First-time Attendee Orientation (Salon A, Lobby Level)
12:45pm—1:00pm Opening Remarks by David Seaman and Barrie Howard (Salons A and B, Lobby Level)
1:00pm—2:30pm
Session 1: THE NATIONAL DIGITAL
INFORMATION INFRASTRUCTURE AND PRESERVATION PROGRAM (Salon A,
Lobby Level)
Maintaining
Archive Integrity During Inter-repository Transfer: Lessons
Learned from the NDIIPP Archive Ingest and Handling Test
Martha Anderson, Project
Manager, Library of Congress, Moderator
Clay Shirky, NDIIPP
Technical Lead, NYU. PRESENTATION
Michael Nelson, Old
Dominion University PRESENTATION
Tim DiLauro, Johns Hopkins
University. PRESENTATION
Keith Johnson, Stanford
University. PRESENTATION
Stephen Abrams, Harvard
University. PRESENTATION
The Archive Ingest
and Handling Test (AIHT), a practical experience with the
proposed National Digital Information Infrastructure and
Preservation Program (NDIIPP) architecture, completed its work in
early summer 2005. Project teams from Harvard, Johns Hopkins, Old
Dominion and Stanford investigated diverse technologies and
approaches ranging from the examination of new models of
preservation with self-archiving object technologies to testing
content repository technologies as platforms for preservation.
Risk assessment tools and file evaluation and validation tools
were developed and used. A final phase of the project simulated
change over time by testing file migration to different
formats.
Surprising
challenges to basic assumptions about file and whole archive
transfer were brought to light. The tested approaches and lessons
learned about the transfer, assessment and management of a
digital archive will be discussed by principal investigators from
participating institutions and the Library of Congress. The
University of Michigan digital library system was developed in
the 1990s to process SGML files using Perl scripts.
1:00pm—2:30pm
Session 2: METADATA STRATEGIES
(Salon B, Lobby Level)
A Taxonomic
Approach to the Organization of Penn State Web Space
Michael Pelikan, Pennsylvania
State University.
Penn State
University has had broad experience with several search engines,
and currently holds a University-wide license for the Google
Search Appliance. While useful for general-purpose Web searches,
the Google Search Appliance does not currently address critical
search and retrieval issues in the Penn State Web environment
The Taxonomic Tags
group was formed as a university-level group with representatives
from Information Technology Services, Finance & Business
Administration, and the Penn State Libraries, to examine whether
a taxonomic model, expressed as metadata tags and systematically
applied across the university's Web pages could:
- permit specific pages to be the top hit for specific searches,
- make it easier to find specific pages from among the University's more than 1,000,000 public Web pages,
- remain useful amidst increasing adoption of content management systems across Penn State,
- remain useful over
time as search engines continue to evolve, despite whether open
source or commercial (and often, proprietary) search algorithms
are employed.
The Tags group has
developed recommendations to address these issues. These
recommendations include the development of a controlled
vocabulary, along with synonyms, for Penn State departments,
colleges, administrative units, etc. These terms would be
incorporated both into Web pages, and into the university's LDAP
system.
The Group has
recommended that a broadcast search mechanism be developed for
the main Penn State Web search screen. Under this system, user
search terms will be submitted both to a Web search appliance and
to the university's LDAP system. The results will be combined,
identified and presented to the user.
Members of the Tags
Group will present background on the project and update its
progress to date. The Tags Group is highly interested in
questions and comment, and will tailor the presentation to permit
as much time for discussion as possible.
Unpacking the
Interpretation of METS Markup
David Dubin, University of Illinois
at Urbana-Champaign. PRESENTATION
Like most XML
applications, METS, the Metadata Encoding and Transmission
Standard, overloads a small number of generic syntactic
relationships (e.g., parent/child) to represent a variety of
specific semantic relationships. Human beings correctly infer the
meaning of METS markup, and these understandings inform the logic
and design of applications that import, export, and transform
METS-encoded resources and descriptions.
However, METS's
flexibility and generality invite diverse interpretations, posing
challenges for processing across different METS profiles and
local adaptations. Robust processing requires support in the form
of a general software library for reasoning about METS documents.
We describe the current state of development for such a
library.
This METS
interpretation software is an application of the BECHAMEL markup
semantics framework (Dubin et al, 2003). BECHAMEL applications
translate properties and relationships expressed in conventional
markup into logical assertions that unpack the overloaded
XML-based syntax. The inference problems we aim to support
include identifying inline and external storage objects, mapping
storage objects to resources and descriptions, and correctly
classifying the role of namespaces.
Another goal of
explicating the interpretation of METS documents is to
reserialize them in XML, directly asserting as many of the
inferred facts as we can. In this way we hope to improve
prospects for long term digital preservation.
D. Dubin, C. M.
Sperberg-McQueen, A. Renear, and C. Huitfeldt. A logic
programming environment for document semantics and inference.
Literary and Linguistic Computing, 18(2):225-233, 2003.
The Problem with Duplicates
Esme Cowles, UCSD Libraries.
When harvesting
large numbers of non-unique metadata records from several
different institutions, duplicate records are inevitable.
Typically, duplicates are identified by comparing globally-unique
identifiers (such as ISBNs) or definitive metadata elements (such
as creator and title statements). However, some disciplines (such
as art) have neither unique identifiers nor definitive metadata,
making the task of identifying duplicate records much harder.
The Union Catalog
for Art Images (UCAI) project aggregated 920,000 metadata records
for slides and digital images from six institutions, mapped all
records to a common schema (VRA Core) and attempted to identify
and merge duplicate records. Drawing on clustering techniques,
controlled vocabularies, and string-comparison algorithms the
UCAI team developed tools and software to compare metadata
records and identify duplicate records.
The experience of
the UCAI team demonstrates the challenges of working with
metadata created without common content standards, controlled
vocabularies, or unique identifiers, and provides guidance for
content producers and aggregators.
2:30pm—3:00pm Break (Prefunction Area, Lobby Level)
3:00pm—4:30pm
Session 3: DIGITAL PRESERVATION
(Salon A, Lobby Level)
Preserving Digital Resources: Complexities and Emerging Solutions
(A View from the NDIIPP Partners Early Work) PRESENTATION
Joanne Kaczmarek, UIUC
Patricia Cruse, CDL.
Martin Halbert , Emory.
Anthony Ramirez ,
University of Maryland.
Bill LeFurgy, Library of
Congress.
Jim Tuttle , NCSU.
Nan Rubin , WNET.
The general purpose
of this discussion panel is to present the challenges of digital
preservation as experienced to date by institutions engaged in
digital preservation projects through the NDIIPP initiative, and,
specifically, to examine and discuss examples of solutions
currently implemented or being considered by NDIIPP partners.
Discussion will
open with remarks from the Library of Congress on specific
challenges presented by the "digital preservation problem." After
a brief introduction of NDIIPP partners, representatives from
each project will provide concise examples of currently
implemented or pondered solutions to the specific challenges they
are encountering.
Challenges arising
include varied technical issues, as well as issues related to
workflow, rights clearance, and the economic sustainability of
preservation activities. This discussion will also include
consideration of the role the NDIIPP partnership model might play
in developing preservation strategies and solutions. The session
format is intended to foster discussion among panelists, and
audience participation is encouraged.
Libraries and
archives have traditionally played the role of "trusted
repositories," assuming long-term responsibilities for assuring
the integrity and authenticity of materials deposited with or
collected by them. With the proliferation of digital resources,
the role of a "trusted repository" takes on a new aspect,
requiring libraries, archives and other institutions to
re-conceptualize their place in providing assurances of long-term
digital preservation.
Recognizing the
need for a coordinated approach to preserving digital resources,
the Library of Congress launched a $99.8M national digital
strategy effort through the National Digital Infrastructure
Information Preservation Program (NDIIPP). Its mission is to
"develop a national strategy to collect, archive and preserve the
burgeoning amounts of digital content, especially materials that
are created only in digital formats, for current and future
generations." In Fall 2004, the eight projects participating in
this proposed panel were awarded funding. More information about
NDIIPP and its partners may be found at http://www.digitalpreservation.gov.
3:00pm—4:30pm
Session 4: MANAGING DIGITAL LIBRARY
CONTENT AND CODE (Salon B, Lobby Level)
Flipping the Switch: Lessons Learned from a Major Digital Library Migration
Project
Jon Dunn , Mark Notess , and Ryan Scherle , Indiana University
Digital Library Program. PRESENTATION
In 2005, the
Variations2 digital music library transitioned from being only a
research project to becoming the replacement for the heavily-used
Variations online music service at Indiana University. The effort
to bring this second-generation digital library into production
included re-processing and checking approximately 10,000
digitized sound recordings, the creation of a new digital ingest
tool, and the development of an access control mechanism to
ensure appropriate copyright safeguards.
Moving nine years'
worth of digitized recordings proved to be more than a simple
matter of pointing the new tool at the old files. We had to
retrieve, and in some cases locate, the original .wav files and
re-encode them to support the superior capabilities of the new
tool. We also moved the production file server from a tape-based
system to hard disks. Re-encoding ran 24x7 for approximately two
months. Subsequent error checking and clean-up took several
months more.
The transition
provided an opportunity to reassess and improve our audio ingest
process. The new digitization tools were designed in consultation
with the digitizing staff and fit much better with the
digitization workflow, increasing throughput and improving
quality.
The Variations2
access control mechanism limits out-of-library use of copyrighted
materials according to a new access policy, based on student
course enrollment. With this access mechanism in place, we are
distributing the Variations2 client software to students to
support home access of streaming audio and scanned score
images.
This talk describes
the lessons learned, and the surprises along the way, during the
Variations to Variations2 migration.
Organizing
Project Code for Digital Library Applications
Eric Stedfeld , New York
University. PRESENTATION
Many digital
library projects suffer a circuitous evolution, starting as a
demo or proof of concept in a scripting language, then adding on
requirements and expectations until the project reaches
production status.
This can result in
disorganized code that is difficult to maintain, update, extend,
or scale, let alone transition into another coding environment
such as Java. Even more challenging, programmers familiar with
the original scripting language may have little background in the
principles and methods of the new environment that lead to good
programming practice.
This presentation
provides some approaches for better structuring and maintaining
such code, based on Java Guidelines and Patterns. The example
application, a digitized collection of Colonial and Early
American documents, utilizes servlets, JSP, JavaBeans, a database
backend, and XML files generated with the METS Java Toolkit. The
principles presented can help session participants make their
application code more manageable, extensible and scalable, while
saving time and reducing frustration in software development.
4:30pm—4:45pm Break (Prefunction Area, Lobby Level)
4:45pm—6:00pm
BIRDS OF A FEATHER SESSION 1
1) The DLF Electronic Resource Management
Initiative: Phase 2 (Salon A, Lobby Level)
Adam Chandler, Cornell. PAPER
This presentation
will describe the scope of Phase 2 of the DLF Electronic Resource
Management Initiative ("ERMI 2"), including timeline, objectives
and deliverables.
2) OCKHAM: Digital Library Service
Registries (Salon B, Lobby Level)
Jeremy Frumkin, Oregon State
University. PRESENTATION
Martin Halbert, Emory
University.
The Ockham Project
will hold a BOF session to explore Digital Library Service
Registries (DLSRs) at the DLF Fall Forum. With the gaining
popularity of metasearch tools, OAI-PMH available collections,
the use of OpenURL resolvers, and the emergence of new efforts
such as COinS, there is a growing need for registries to support
access to these services. This BOF will focus on the concept of
the DLSR, what functions a DLSR supports, and will examine
current DLSR efforts, including the OCLC OpenURL registry, the
JISC/IESR DLSR, and the Ockham Distributed DLSR.
In addition, we
will discuss how DLSRs might play a key role in enabling new
digital library functionality. Combined with the concept of
"Autodiscovery" (techniques for automatically finding
machine-processable resources associated with a particular web
page), can we utilize DLSRs to lower the barriers to information
integration while at the same time enabling greater and more
complex information workflows? Can we create a "digital library
dialtone" which makes connecting digital library services and
content as easy as placing a phone call? Come to this BOF to find
out!
3) Archival Information
Control (Ashlawn/Highlands, Lobby Level)
Stephen Davis, Columbia, with
Ellie Brown and Karen Calhoun, Cornell.
A discussion about
how to get control institutionally over finding aid creation and
management as well as the full lifecycle of archival collection
information. Lee Mandell will also report briefly on the status
of the Mellon-funded Archivists Toolkit project.
4) Digital Preservation for
Photographs (Lewis/Clark, Lobby Level)
Erin Rhodes, U.S. National Archives
and Records Administration. PRESENTATION
David Seaman, DLF.
Last year, NARA
produced a very well received electronic publication, Technical
Guidelines for Digitizing Archival Materials for Electronic
Access: Creation of Production Master Files - Raster Images
(Steven Puglia, Jeffrey A. Reed, and Erin Rhodes, the U.S.
National Archives and Records Administration, June 2004),
subsequently issued as a print document by DLF (see http://www.diglib.org/pubs/dlfpubs.htm#nara-raster).
This report
addresses a spectrum of considerations for digitizing a variety
textual and photographic records, including file formats, image
capture, metadata, and quality assessment. In April 2005, a team
of experts from across the DLF and beyond, including Harvard,
NARA, LC, Kodak, Chicago Albumen Works, RLG, RIT, and the Swiss
Federal Institute of Technology, convened to build on this work
to produce a guide for the digitization of photographs for
preservation reformatting. This session allows you to hear about
our plans, and have input into the process. You will also be able
to preview a proposed new quality control target designed for
image digitizing workflows in libraries and museums.
7:00pm—9:30pm Reception (Harrison Institute, University of
Virginia)
Note:
Round-trip transportation will be provided from the Omni hotel to
the reception site on the University of Virginia campus.
DAY TWO: TUESDAY, NOVEMBER 8, 2005
DAY TWO: TUESDAY,
NOVEMBER 8
8:00am—9:00am Breakfast (Atrium, Lobby Level)
9:00am—10:30am
Session 5: REMODELING DIGITAL LIBRARY
SYSTEMS (Salon A, Lobby Level)
Re-architecting a Digital
Library System for XML/XSL and Unicode:
Lessons Learned
Phil Farber, Alan Pagliere, Chris Powell, John Weise, and Perry Willett
(all University of Michigan). PRESENTATION
The University of
Michigan digital library system was developed in the 1990s to
process SGML files using Perl scripts. At Michigan, this system
provides access to over 20,000 texts, 250,000 images, and 850
archival finding aids. In addition, this system is used at 28
other institutions.
This presentation
will describe the transformation of our digital library system
for XML/XSL and Unicode, and the lessons learned. Panelists will
describe the original system, discuss the reasons for this major
undertaking, and will cover topics such as the planning process,
how production systems were maintained during the
re-architecture, large-scale data conversion from SGML/ISO-Latin1
to XML/Unicode files, tools for conversion, processing, version
control and debugging, testing, and what they've learned in the
process.
9:00am—10:30am
Session 6:
DLF AQUIFER (Salon B, Lobby Level)
Katherine Kott, DLF Aquifer. PRESENTATION
Leslie Johnston, UVA.
Sarah Shreeves, UIUC. PRESENTATION
Jon Dunn, Indiana.
Martin Halbert, Emory.
DLF Aquifer, the
Digital Library Federation distributed open digital library is in
implementation. Since the DLF Spring Forum, project working
groups have created a collection development policy, a DLF
Aquifer MODS metadata profile to support service development and
have selected a small set of digital collections to use as an
initial test-bed. University of Michigan will begin harvesting
metadata for the project soon.
The services
working group has identified target audiences, developed use
cases and surveyed DLF libraries to learn what is already known
about digital collection use. Taking their cues from the services
working group, the technology/architecture working group
completed a draft of architectural principles and selected the
"repository neutral" framework designed at Johns Hopkins as the
DLF Aquifer "middleware layer".
This panel will
review the accomplishments of the past six months and outline the
phase I deliverables that will be demonstrated at the DLF Spring
Forum in Austin next April.
10:30am—11:00am Break (Prefunction Area, Lobby
Level)
11:00am—12:30pm
Session 7: DYNAMIC DIGITAL ENVIRONMENTS
(Salon A, Lobby Level)
A Format-registry Based
Automated Workflow for the Ingest and Preservation of Electronic
Journals
Evan Owens, Chief Technology
Officer, Portico. PRESENTATION
Portico (http://www.portico.org), with
funding from The Andrew W. Mellon Foundation, Ithaka, and JSTOR,
has developed an automated workflow for the ingest of
publisher-supplied e-journal source files into a preservation
repository. Electronic journals as a preservation challenge sit
somewhere between traditional digitization projects and
Web-harvesting projects in that the formats are known and
controlled but by the content provider rather than by the
archive. The workflow that Portico has developed builds on
concepts that were developed in the preliminary work towards a
Global Digital Format Registry (GDFR) and on the JHOVE tool set.
The components of the workflow include package disassembly,
format identification and verification, structure mapping,
automated metadata harvesting, rule-based format normalization,
and support for quality control and inspection. The system
implementation uses a service-based architecture built upon a
format registry and a tool registry with support for distributed
and pluggable tools. This presentation will review the workflow
and system design and discuss our experience in designing and
building a system based on a format registry.
WikiD—Applying Wiki Principles to Structured Data
Jeffrey A. Young, OCLC Online
Computer Library Center, Inc. PRESENTATION
Ward Cunningham
describes a wiki as "the simplest online database that could
possibly work". The cost of this simplicity is that wikis are
generally limited to a single collection containing a single kind
of record (viz. WikiMarkupLanguage records). WikiD extends the
Wiki model to support multiple collections containing arbitrary
schemas of XML records with minimal additional complexity.
WikiD is
essentially a lightweight framework combining:
- Open-source implementations of various loosely-coupled open-standard
protocols (e.g. OpenURL, SRW/U, SRW Update, OAI-PMH, RSS)
- An open-source version-controlled database.
- A set of bootstrap collections:
- CollectionCollection - the master collection of all collections
defined in WikiD
- CollectionExternalSchemas - a registry of XML Schemas used to
constrain the items in WikiD collections
- CollectionWikiPages - the default collection that not only
provides WikiD's conventional out-of-the-box wiki functionality
but also acts as the user interface for the creation and
maintenance of other collections.
- XSL Stylesheets to
render collection-level open-standard protocol responses into
HTML for human consumption. Automated processes can ignore the
stylesheet reference and use the open-standard protocol responses
directly.
Possible
applications for WikiD include collaborative maintenance of
registries, thesauri, taxonomies, reviews, and documentation. In
addition to a standard set of features available for all
collections, custom code (e.g. Java or XSL) can also be assigned
to provide new types of Web services related to individual
collections.
The WikiD project
page can be found at http://www.oclc.org/research
/projects /wikid/default.htm. A demo is running at http://alcme.oclc.org /wikid/.
Instructions for creating a new collection can be found at
http://alcme.oclc.org/wikid/DemoInstructions.
A J2EE Web app distribution is in the works.
11:00am—12:30pm
Session 8: COLLABORATIVE METADATA AGGREGATIONS (Salon
B, Lobby Level)
Collaborative Metadata Aggregations: The Road to Shareable Metadata
Sarah L. Shreeves, University of
Illinois at Urbana-Champaign, Moderator
Bill Landis, California
Digital Library
PRESENTATION
Trish Rose, University of
California at San Diego
PRESENTATION
Timothy Cole, University of
Illinois at Urbana-Champaign
PRESENTATION
Jenn Riley, Indiana
University PRESENTATION
DLF emphasizes the
role of collaboration to better understand how best to share our
digital content and metadata. To this end it has supported the
writing of the DLF-NSDL Best Practices for OAI Data Provider
Implementations and Shareable Metadata in order to cope with the
most common difficulties in the exchange of metadata between
content providers and aggregators. This best practices work has
drawn on the experiences of collaborative projects involving
metadata sharing and has highlighted the importance of such
projects to facilitate the dialogue between content providers and
aggregators.
This panel will
briefly highlight the experiences of several OAI based and
non-OAI based collaborative aggregations and best practices
building initiatives and will then turn to an open discussion of
the issues facing these collaborative projects and initiatives
and how they help foster more efficient mechanisms for sharing
metadata.
12:30pm—2:30pm Break for Lunch (Individual Choice)
Note: There
is an open-air, pedestrian mall just outside the Omni hotel where
there are many restaurants for lunch choices.
2:30pm—4:00pm
Session 9: NARA'S ELECTRONIC RECORDS
ARCHIVES (Salon A, Lobby Level)
Metadata
Implementation Perspectives for the ERA System
Quyen Nguyen, Systems Engineering
Division, ERA Program Management Office, U.S. National Archives
and Records Administration. PRESENTATION
With the advent of
Information Technology, more and more records today are digital
born. In order to continue to fulfill its mission in the computer
age, the U.S. National Archives and Records Administration (NARA)
has made the decision to develop the Electronic Records Archives
(ERA) system.
The ERA system
represents an endeavor undertaken by the agency to preserve
digital records, and make those records accessible independently
of hardware and software with which they are created. Metadata is
an important element in such a system whose core functionality is
digital preservation for long term access by the public.
This paper will
discuss the potential issues and impact of implementing metadata
in ERA from the perspectives of system architecture, data
management, and software design.
We also present the
information technologies that we are considering for the
implementation of metadata within the ERA system such as XML, and
Web services. By referring to the OAIS information model, we will
look at different types of metadata, and how the system could
support the creation and maintenance of these metadata
automatically and manually via workflow. Meeting the security
requirements with different levels of data and metadata
classification also constitutes a challenge in the system design
process. The decision of how to store metadata vis-à-vis
records and record aggregates will significantly impact the
software design object model, the storage size, and data
replication. Metadata as well as data replication are critical to
ensure the availability and safeguard of the archival
records.
Indexing and
Search Implementation Perspectives for the ERA System
Dyung Le, Systems Engineering Division,
ERA Program Management Office, U.S. National Archives and Records
Administration. PRESENTATION
With the advent of
Information Technology, more and more records today are born
digital. In order to continue to fulfill its mission in the
computer age, the U.S. National Archives and Records
Administration (NARA) has made the decision to develop the
Electronic Records Archives (ERA) system. The ERA system
represents an endeavor undertaken by the agency to preserve
digital records over an indefinite period of time, and make those
records accessible independently of the hardware and software
with which they were created. The capability for indexing and
searching of its assets is an important element in a system whose
core functionality is digital preservation of electronic records
for long term access by NARA and the public.
This paper will
discuss the potential issues and impacts related to the
implementation of Indexing and Search functionality in ERA from
the perspectives of system architecture, software design,
usability, and long term maintainability. We will present the
information technologies represented by the major vendors of
Enterprise Search COTS that we are analyzing for possible
implementation of Indexing and Search services within the ERA
system.
Meeting the
information retrieval needs of the diverse, and potentially huge,
ERA user community, given the resource limitations of ERA is a
serious challenge. We will discuss options being considered by
NARA to meet this challenge. Meeting the security requirements of
a solution which must house records with different levels of
classification or that contain sensitive information also
constitutes a challenge for the Indexing and Search service. ERA
is intended to exist for an essentially indefinite period of
time, and its service oriented architecture provides the
flexibility to evolve over time as technology changes, including
changing out COTS products. There are no current or emerging
standards (other than for metadata) governing the Enterprise
Search arena. Hence there is a real danger of becoming locked
into a particular Enterprise Search vendor's proprietary
approach. The paper will discuss the related technical issues and
possible mitigation.
2:30pm—4:00pm
Session 10: OAI FOR DIGITAL LIBRARY AGGREGATION (Salon B, Lobby Level)
David Seaman, DLF
PRESENTATION
Kat Hagedorn, Michigan
PRESENTATION
Martin Halbert, Emory
PRESENTATION
Sarah Shreeves, UIUC
PRESENTATION
Tom Habing, UIUC
PRESENTATION
Perry Willett, Michigan
DLF in partnership
with Emory, Michigan, and UIUC, is researching, designing, and
prototyping a "second generation" OAI finding system,
capitalizing on the lessons learned from the first wave of OAI
harvesting and using as its raw material collections drawn from
across the DLF membership. The aim is to foster better teaching
and scholarship through easier, more relevant discovery of
digital resources, and enhance libraries' ability to build more
responsive local services on top of a distributed metadata
platform.
This panel will
update the DLF community on the progress of this work, and
solicit feedback while we are still in medias res. The major
deliverables will be described and demonstrated such as they can
be, with particular emphasis on the first three, which are
furthest along at this point:
1) Best Practices
guidance for OAI use in libraries, with particular emphasis to
the recommendation that we adopt MODS as the metadata schema to
convey the richness of description that we are convinced we need
to build OAI records that truly support innovative scholarship.
The first version of the Best Practices document will be
available online by the Forum and in print soon after, as grant
deliverables. Emory University and the other DLF IMLS partners
have also developed a curriculum series of OAI best practices
training materials. These materials will be used to train staff
and coordinate activities of DLF libraries interested in sharing
metadata concerning their digital collections, and is intended to
be shared with the larger digital library community.
2) A pair of portal
prototype finding systems, informed heavily by feedback received
from the grant-funded Scholars Advisory Panel, http://www.diglib.org/architectures/oai/imls2004/OAISAP05.htm.
One portal, The DLF OAI Portal, offers a single place to access
all OAI records (items and collections) from DLF institutions:
http://www.hti.umich.edu/cgi/b/bib/bib-idx?c=imls;page=simple
The second, in production now, takes 330,000 MODS-based OAI
records from four DLF institutions and is building a prototype
service that reflects the service and functional desires of our
scholarly team.
3) An Experimental
OAI Registry at UIUC of use principally to builders of OAI
services: http://gita.grainger.uiuc.edu/registry.
The most significant recent additions to the registry are rich,
human-generated collection descriptions for many of the DLF
member OAI data providers, including description of select
subsets. These data are browsable via the registry Web interface
or as XML which conforms to the DC Collection Description
profile.
4) A Survey of
Digital Library Aggregation Services, version 2: as part of the
grant, Martha Brogan is revisiting her 2003 Survey, and we will
be publishing the results early in 2006.
As the grant
progresses, we are also expecting to look at
auto-characterization of data, Web services, and interfacing our
prototype systems with Google. Evaluation will be a significant
component later in the grant period.
4:00pm—4:30pm Break (Prefunction Area, Lobby Level)
4:30pm—6:00pm BIRDS OF A FEATHER SESSION 2
5) Open Archives Initiative Protocol for
Metadata Harvesting: Best Practices for Data Provider
Implementations and Shareable Metadata (James Monroe, Lobby
Level)
Sarah L. Shreeves, University of
Illinois Library at Urbana-Champaign
A working group
made up of members of the DLF and NSDL and representing both
service and data providers have been developing a set of best
practices for OAI data provider implementations and shareable
metadata (http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl
?TableOfContents). Specifically the Best Practices for OAI
Data Provider Implementations (
http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?DataProviderPractices)
offer guidelines and recommendations for a range of integral and
optional pieces of the protocol (deleted records, sets,
datestamps, descriptive containers). The Best Practices for
Shareable Metadata (
http://oai-best.comm.nsdl.org/cgibin/wiki.pl?IntroductionMetadataContent)
outline general guidelines for authoring metadata that is useful
and effective within larger aggregations.
These best
practices are now beginning to go under public review and comment
in preparation for publication in the coming year. This session
is open to anyone who would like to discuss these best practices
and ask questions or raise concerns with members of the working
group.
6) Preservation Planning for Digital
Objects and Repositories (Ashlawn/Highlands, Lobby Level)
Taylor Surface, OCLC
Stephen Abrams, Harvard.
Many of us are
implementing or operating repositories of digital materials with
an eye toward preserving the objects for the long-term. While
there has been much theoretical discussion on practices, those of
us with these repositories now face the very real challenge of
providing these services. Come to this BOF to share your current
practice for digital preservation planning and discuss
opportunities for creating best practice.
7) METS Implementation and METS
Profile Building (Lewis/Clark, Lobby Level)
Nancy J. Hoebelheinrich,
Stanford University Libraries.
The METS Editorial
Board will hold a Birds of a Feather session on Technical Issues
related to METS implementation and METS Profile Building. Members
of the METS Editorial Board who have successfully written,
registered and implemented METS profiles will be in attendance,
as well as members of the METS community who are in the process
of developing METS profiles (including Arwen Hutt from UCSD and
Rob Wolfe from MIT/DSpace). Topics to be discussed include
identifying the purpose and function that a METS profile can
serve in local implementations of METS, how and whether local
workflow should influence the development of a profile, and how
profiles are designed to facilitate content and metadata sharing
among members of the METS community. Lively discussion will be
encouraged!
DAY THREE: WEDNESDAY,
NOVEMBER 9
8:00am—9:00am Breakfast (Atrium, Lobby Level)
9:00am—10:30am
Session 11:
DIGITAL LIBRARY INITIATIVES IN SOUTH AFRICA AND ENGLAND
(Salon A, Lobby Level)
Herding Big
Cats: An African Experience of Collaboration
D. P. Peters, University of KwaZulu
Natal. PRESENTATION
DISA, Digital
Imaging South Africa, is a national collaborative digitization
project, funded by The Andrew W. Mellon Foundation, to make
available for international research, the repressed archival
documentation of the apartheid era in South Africa. Some 70,000
pages have already been made available from http://disa.nu.ac.za.
In partnership with
Aluka, a project of Ithaka Harbors Inc., the project has recently
embarked upon a second phase, to make available research
resources of neighboring regimes under the regional topic,
Southern African Freedom Struggles, 1950-1994. This presentation
will share an African experience of collaboration, in building
partnerships with users to provide context, and with institutions
to provide content.
The South African
apartheid system can be equated on the level of crimes against
humanity that history and responsible stewardship must prevent in
the future. Digital technologies ideally serve this aim in the
dissemination of information, but the process of building
collaboration is not without pitfalls, beyond the organizational
challenges.
The DISA project
has recently focused on developing a common understanding amongst
librarians, archivists, scholars and politicians, of its role in
interpreting a sensitive and painful period of history.
Engagement of the
scholarly community serves to build contextual layers in the
information architecture, with descriptive essays linked to the
archival resources by means of topic maps. The objective is to
build a research resource for teaching and learning, stimulating
curriculum development in this subject area. But archival
concerns for a perceived loss of ownership must be juggled with
access, and local heritage preservation with global cultural
imperialism.
This presentation
will investigate some of the benefits and pitfalls experienced in
building new user communities in national and international
collaboration.
Digital Library
Activities at Oxford
Michael Popham, Oxford University.
PRESENTATION
The libraries of
Oxford University have a long-standing interest in digital
technologies. Large-scale digitization projects undertaken more
than a decade ago such as "Early Manuscripts at Oxford" (http://www.image.ox.ac.uk/) are
still attracting a growing number of users who wish to access
selected items from our extensive holdings. However, by 2000 it
had become apparent that the hitherto piecemeal and largely
project-based approach to the selective digitization of material
was not going to be sustainable in the longer term. Quite apart
from the resources required to support and maintain dozens of
separate Web sites built on a variety of applications and data
standards, it was clear that digital surrogates were becoming
acceptable to the scholarly community and greatly increasing
public access to material that most readers would be unlikely to
see first hand. A new approach was required.
In the summer of
2001, Oxford University Library Services established the Oxford
Digital Library (ODL): a combination of services and technologies
intended to develop, test, and implement the policies and
standards that would underpin a University-wide framework for the
digitization of library holdings. Thanks to a generous grant from
the Andrew W Mellon Foundation, a Development Fund was
established to create a testbed of core content for the ODL
intended to be used by researchers, teachers, and the global
community of learners.
The initial 4-year
development phase of the ODL concludes in October 2005, and this
presentation will outline the lessons we have learned to date,
the implications for the way the ODL is likely to develop, and
also look at the impact of such endeavors as the Oxford-Google
digitization partnership.
This presentation
will provide an update on digital library developments at the
University of Oxford, outline the lessons that have been learned
from the initial four-year development phase of the Oxford
Digital Library, and discuss the likely impact of the
Oxford-Google digitization agreement.
9:00am—10:30am
Session 12: WEB
ARCHIVING SERVICES (Salon B, Lobby Level)
Martha Anderson, Library of
Congress. PRESENTATION
John Tuck, The British
Library. PRESENTATION
Taylor Surface, OCLC.
PRESENTATION
John Kunze, CDL. PRESENTATION
Web archiving
services emerging at a number of different institutions will
enable librarians and other document selectors to extend their
historic collection-building roles into the domain of web-based
materials. Such services will allow curators to initiate and
monitor web crawls relevant to specific topic areas, analyze and
annotate harvested data, and search and browse local archives
built from sites that may have been harvested multiple times.
1) "Introduction to
Web Archiving" (Martha Anderson): a brief overview of the current
landscape of challenges and opportunities of archiving web
resources.
2) "Web Archiving
at the British Library" (John Tuck): The British Library is lead
partner in the UK Web Archiving Consortium (UKWAC) (www.webarchive.org.uk) and is a
member of the International Internet Preservation Consortium
(IIPC).
The focus of the
presentation will be on collaborative working nationally and
internationally. There will be specific reference to the
challenges faced by UKWAC in areas such as permissions and legal
deposit, software, and collection development and, in the case of
IIPC, to current initiatives including progress on procurement
for an automated smart crawler in conjunction with the
Bibliothèque nationale de France.
3) "UIUC/OCLC's
ECHO DEPository Project" (Taylor Surface): OCLC, as part of the
ECHO DEPository NDIIPP project, is leading the development of a
suite of open source web archiving tools named the Web Archives
Workbench, which is based on an archival selection model
developed at the Arizona State Library. OCLC will discuss the
challenges facing state libraries in the collection of web
information and review how the tools of the Web Archives
Workbench help with those challenges.
4) "CDL's Web
Archiving Service" (John Kunze): An overview of CDL's Web
Archiving Service (WAS) and its approach to long-term
preservation. The approach includes generating "dessicated data"
(long-lived, low-tech derivatives for certain formats), defining
service levels, assigning persistent identifiers, and replicating
content at geographically distant locations.
10:30am—11:00am Break (Prefunction Area, Lobby
Level)
11:00am—12:30pm
Session 13: SUSTAINING DIGITAL
SCHOLARSHIP (Salon A,
Lobby Level)
Bradley Daigle, Mike Furlough, Thornton Staples, and Madelyn Wessel
(all University of Virginia Library).
Sustaining Digital
Scholarship ("SDS") is a project at the University of Virginia
Library that explores the complex technical, legal, institutional
and policy issues arising for libraries in the development and
formal collection of original digital scholarship. In the
humanities, these born-digital scholarly efforts tend to look
less like existing genres based on print-models (i.e.,
monographs, articles) and more like exhibitions, library
collections, and thematic research archives. Such projects
challenge us to develop consistent methods for production,
delivery, rights management, access, and archiving of digital
content of multiple media and content types. To sustain original
digital scholarship, we assume that we must move beyond providing
individual piecemeal solutions to define standard methods for
collecting these projects by the library.
SDS is a
collaboration among the University of Virginia Library, NINES
(Networked Interface for Nineteenth Century Electronic
Scholarship), the Tibetan and Himalayan Digital Library, and the
Virginia Center for Digital History. Pilot projects under SDS
assume that: (1) the library will formally select, collect,
preserve and distribute original digital scholarly projects
through a digital library architecture based upon Fedora; (2)
that intellectual property rights of those projects allow open
access to the broadest extent possible; (3) that the library will
strive to preserve the intellectual content, structures, and
designs of the project; and (4) that the library will elaborate
formal collection agreements with the scholars and possibly other
institutions.
Mike Furlough will
moderate and discuss the overall aims of the SDS project at
Virginia; Thornton Staples will outline the project's theory of
collection and content aggregation; Madelyn Wessel will review
the policy and legal issues that the project raises for libraries
and scholars; Bradley Daigle will discuss the implementation of
pilot projects and expected outcomes.
11:00am—12:30pm
Session 14: ARCHIVE-IT: A WEB ARCHIVING APPLICATION
(Salon B, Lobby Level) PRESENTATION
Archive-it
Merrilee Proffitt, RLG,
Moderator
Michele Kimpton, Director
Web Archive, Internet Archive.
Carolyn Palaima, Lanic
Project Director, University of Texas at Austin.
Cecile Jagodzinski,
Indiana University.
Kathy Jordan, Electronic
Resources Manager, Library of Virginia.
Dan Avery, Senior Crawl
Engineer, Internet Archive.
Archive-it is a Web
application uniquely designed for the needs of University and
government institutions interesting in preserving Web content.
The application allows organizations with limited infrastructure
and technical staff to collect, catalogue, search and manage
archived Web content through a Web interface.
The Internet
Archive (IA), a nonprofit that manages the largest publicly
available Web archive, developed Archive-it. IA currently
provides these services to large institutions such as Library of
Congress and the US National Archives. It is working with RLG and
a handful of other organizations to make the same service
available at a scale and cost that is broadly accessible. RLG
member institutions participating in this pilot are Indiana
University, the International Institute of Social History,
Swarthmore College (with partner Haverford College), and the
University of Toronto.
Other pilot
participants working directly with the Internet Archive include
the Library of Virginia, University of Texas at Austin, and North
Carolina State Archives. The pilot run of the service is
scheduled to conclude in November 2005, and the service is
scheduled to launch in January 2006.
This presentation
will include an overview of Archive-it and its major functions;
pilot participants will give an overview of why they are
interested in Web archiving, challenges they face in their own
institutions regarding Web archiving, what they've learned so far
using the Archive-it Web application, and how it's being applied
in their institution. By the time of this panel, participants
will be able to discuss their experience with Archive-it,
challenges of Web archiving in general, and provide information
and informed experiences to audience members.
12:30pm
Adjourn
POST-CONFERENCE, NOVEMBER
9
1:00pm—5:00pm METS Editorial Board Meeting (Monticello, Lobby
Level)
12:30pm—4:30pm OAI Vendors' Panel—for project participants
(Ashlawn,
Lobby Level)
1:00pm—5:00pm DLF Developers' Forum Meeting—for project
participants
(James Monroe, Lobby Level)
1:30pm—5:30pm CCS docWORKS Workshop—for project
participants
(Computer Classroom, Fourth Floor, Alderman Library, University
of Virginia)
POST-CONFERENCE, NOVEMBER
10
9:00am—5:00pm METS Editorial Board Meeting
(James Monroe, Lobby Level)
8:30am—3:30pm DLF OAI Implementers Workshop—for project
participants
(Computer Classroom, Fourth Floor, Alderman Library, University
of Virginia)