DIGITAL LIBRARY
FEDERATION
SPRING FORUM 2006
AUSTIN, TX
APRIL 10 – 12,
2006
The Driskill Hotel
604 Brazos Street
Austin, Texas 78701
(800) 252-9367
Floor Plan
PRECONFERENCE: Monday, April 10
8:30 a.m. – 11:30 a.m.
DLF Aquifer Metadata Working Group Meeting—for project participants (Governor's Boardroom)
DLF Aquifer Services Working Group Meeting—for project participants (Maximilian Room)
DLF Aquifer Technology/Architecture Working Group Meeting—for project participants (Chisholm Trail Room)
DAY ONE: Monday, April 10
10:30a.m. – 1:00 p.m. Registration (Mezzanine)
11:30a.m. – 12:15 p.m. First-time Attendee Orientation (Driskill Ballroom)
12:45 p.m. – 1:00 p.m. Opening Remarks (Driskill Ballroom)
1:00 p.m. – 2:30 p.m.
Session 1: PANEL: Developers' Forum Panel: Global Identifier Resolution. (Driskill Ballroom)
Tim DiLauro, Moderator, Johns Hopkins University;
John Kunze, California Digital Library [presentation]; Eva Müller, Uppsala Universitet [Sweden][presentation]; and Herbert Van de Sompel, Los Alamos National Laboratory Research Library [presentation]
Digital object access via stable
identifiers is an important problem for all digital libraries.
The automatic mapping of identifiers to information objects,
known as “resolution”, is complicated by the diversity of
available identifier schemes, resolution technologies, and
expected uses.
A long-standing challenge for digital
libraries is how to make resolution more stable and deterministic
for the information objects they steward. Unable to control other
providers' services, we struggle to make ongoing choices among
providers, their objects and identifiers—the “Their Stuff”
problem. Conversely, we also struggle to set up our own services
so as to provide the best resolution experience to our users—the “Our Stuff” problem.
For example, in the “Their Stuff”
category, a large amount of metadata (and more and more often,
actual content) is being aggregated and indexed based on both
proprietary and open harvesting protocols such as OAI-PMH.
Because of the potential to harvest non-URL-based identifiers
(e.g., URN:NBN, Handle) and the absence of a standard mechanism
that can resolve all (or even most) of them, it is generally
necessary to find a URL equivalent for each digital object in the
harvested metadata. This makes it difficult to do things such as
resolving to one of a number of copies, depending on which is
available at a given time.
Two possible approaches to solving
this and similar problems would be to generalize and/or
centralize resolution. Creating a more generalized mechanism
would make it easier to develop common practice—and common
code—across many content stores with many identifier types.
Developing a more centralized solution would obviate the need for
every system that operates on identifiers to implement its own
complete set of resolution services. These approaches might even
encourage new service models.
The speakers on this panel will
discuss some new approaches to global identifier resolution. They
will address such issues as generalized, scheme-agnostic
mechanisms, resolving to different copies of an object, and
persistence.
Session 2: PANEL:
Implementing the PREMIS Data Dictionary. (Citadel I and II)
[joint presentation of Caplan and Guenther] Priscilla Caplan, Florida Center for
Library Automation; Rebecca Guenther, Library of Congress;
Nancy Hoebelheinrich, Stanford University [presentation]; and Marcus
Enders, Niedersächsische Staats- und Universitätsbibliothek Göttingen [first presentation] [second presentation]
In May 2005, the PREMIS Working Group
(Preservation Metadata: Implementation Strategies) released
its Data Dictionary for Preservation Metadata, which defines and
describes an implementable set of core preservation metadata with
broad applicability to digital preservation repositories. In Nov.
2005, this international working group, comprised of 30 members
from five countries, won the prestigious Digital Preservation
Award, sponsored by the Digital Preservation Coalition and part
of the UK Conservation Awards. This presentation/panel will
discuss progress and problems in implementing the PREMIS data
dictionary and some of the implementation choices to be made,
with a particular focus on its use in METS. It will consist of a
brief high level introduction to PREMIS and a panel discussion of
two implementations and their similarities and differences.
- Introduction to PREMIS: Priscilla Caplan (Florida Center for
Library Automation) Overview of PREMIS, its assumptions and its
neutrality in terms of any particular implementation. Choices for
implementation will be reviewed (i.e. using the PREMIS schema
published on the MA site; incorporating pieces of the schema into
METS; or, incorporating into another framework such as
DIDL).
- Use of PREMIS with METS: A panel of three will discuss how
the PREMIS data elements might be incorporated into METS. Marcus
Enders (Niedersächsische Staats- und Universitätsbibliothek Göttingen) will discuss the MathARC implementation. Nancy
Hoebelheinrich (Stanford University) will present
Stanford's implementation of PREMIS in METS. Rebecca Guenther
(Library of Congress) will outline the general issues to be
considered in implementing PREMIS in a METS context and review
how the two applications have approached it similarly and
differently. The panel will then discuss the various approaches
and take questions.
2:30p.m. – 3:00 p.m. Break (Mezzanine)
3:00 p.m. – 4:30p.m.
Session 3: PANEL:
Libraries and Publishing—Reports from the
Field. (Driskill Ballroom)
Maria Bonn, University of Michigan [presentation];
David Millman, Columbia University [presentation]; Catherine Mitchell,
California Digital Library [presentation]; and David Ruddy, Cornell University [presentation]
For several years, a number of DLF
member libraries have been exploring active roles in the
scholarly publishing domain. These efforts were sparked by shared
concerns: increasing costs, diminishing access, loss of control
of scholarly content, greater consolidation of commercial
publishing—in general, an environment that appeared increasingly
restrictive, expensive, and unsustainable.
As a challenge to prevailing
publishing models, these libraries have been building tools and
providing services in support of scholarly publishing,
experimenting with alternative business models, modes of
production, and technologies, in an effort to identify successful
and sustainable scholarly publishing solutions. This session
includes updates on these efforts and reports on recent
projects.
Maria Bonn will reflect on the
growing pains and growing gains—looking at the strategies
Michigan has taken to scale up, their costs and benefits, and
also considering the extent to which they can and should develop
support for some of the traditional publisher functions that are
outside current library realms of expertise.
David Millman will present on issues
of interoperability at Columbia and the re-use of library
materials in publications, for instruction, and for research.
Catherine Mitchell will present on
the collaboration forged among the California Digital Library,
University of California Press and Mark Twain Papers in
exploiting the CDL's existing XTF infrastructure to create
digital critical editions of all of Mark Twain's works. She will
discuss specifically the kinds of editorial and infrastructure
issues born of this collaboration and the project's promise of
both delivering and informing scholarly work.
David Ruddy will report progress on a
collaborative effort by Cornell University Library and the Penn
State Libraries and Press to develop and distribute open source
publishing software. DPubS, developed to support Project Euclid,
Cornell's publishing initiative in mathematics, is a flexible and
extensible publishing platform that will allow libraries to
create alternative and affordable publishing opportunities for
their communities and beyond.
Session 4: PANEL: The
LC/NSF Digital Archiving and Long-term Preservation Research
Program (Digarch): Results and Prospects. (Citadel I and II)
William LeFurgy, Library of Congress ;
Ardys Kozbial, University of California, San Diego [presentation]; Margaret Hedstrom, University of
Michigan ; and Michael Nelson, Old Dominion University [presentation]
The panel will provide a brief background about the program and reports from three of the 10 projects funded from the first round. Project reports will highlight preliminary findings that may be of broad interest to the digital preservation community. There will be discussion about how the projects relate to other Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP) initiatives. Plans will be outlined for a potential second round of Digarch projects, which again will be administered through the National Science Foundation.
William LeFurgy to discuss NDIIPP and Digarch overall; Ardys Kozbial to discuss the “Digital Preservation Lifecycle Management Building”
Digarch project; Margaret Hedstrom to discuss
“Incentives for Data Producers to Create ‘Archive-Ready' Data Sets”
Digarch project; and Michael Nelson to discuss “Shared Infrastructure Preservation
Models” Digarch project.
4:30p.m. – 5.00 p.m. Break (Mezzanine)
5.00 p.m. – 6.30p.m.
Session 5: Metadata Strategies (Driskill Ballroom)
A) Development and Testing of
Schema for Expressing Copyright Status Information in Metadata:
Recommendations of the Rights Management Framework Group,
California Digital Library.
[presentation] Karen Coyle, California Digital
Library, and Sharon Farb, University of California, Los Angeles
Current efforts to express
intellectual property rights associated with digital materials
have focused on access and usage permissions, but many important
permissions are defined by an item's copyright status
rather than by license or contract. These permissions are not
included in existing rights expressions. Digital libraries hold
and provide access to many items for which copyright status is
the sole governor of use, and even for licensed materials
copyright status is often an essential element for those wishing
to make further use of a work.
The California Digital Library (CDL)
is working on a rights framework that will include
recommendations for metadata to express the copyright status of
digital resources. This metadata should accompany digital
materials and be offered to users to inform them of the copyright
status and potential uses of the item. It also allows the
depositor to clearly state what data about the copyright status
is not known by the holding library or archive, and what data may
be known but has not been provided. Because this copyright
information is often unknown or scant, the metadata includes
fields for contact information for the office or individual who
can best advise on use and permissions for the object in
question.
Early versions of this work have been
presented at the NISO Workshop on Rights Expression and the
Society of American Archivists meeting, both in 2005. CDL has now
developed a first schema language for this metadata and is
seeking partners to test the metadata in actual digital library
settings.
B) Truth and Consequences,
Texas: The University of Texas Libraries' Metadata Registry
Project.
Alisha Little and Erik Grostic, University of Texas at Austin [presentation]
The University of Texas at
Austin's Metadata Registry began as a research project in
2001, and morphed into a fast track development project in 2003.
This presentation will take people through the entire development
and implementation process for the University of Texas
Libraries' Metadata Registry. It will include: the
rationale behind developing in house from scratch, rather than
utilizing or modifying an existing product; the decisions we made
regarding the data model and the use of FRBR and Dublin Core;
what we wanted the system to do vs. what it does do; the perils
of developing a pilot using a pilot (java struts); how we use it
and how it works for us; and future development goals and
questions.
C) Sharing Resources by
Collection: OAI Sets and Set Descriptions.
Muriel Foulonneau and Sarah L. Shreeves, University of Illinois at Urbana-Champaign, and Caroline Arms, Library of Congress [presentation]
Many institutions are sharing their
digital resources using metadata-sharing frameworks such as
OAI-PMH. They sometimes organize their resources into subsets,
such as OAI sets, which may or may not correspond to a defined
collection. As the DLF/NSDL Best Practices for OAI Data Provider
Implementations and Shareable Metadata <http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?TableOfContents>
and other research notes, clustering resources by collections
contributes to improving metadata shareability because the
collection can provide context to individual items aggregated.
The OAI protocol allows the definition of metadata sets and set
descriptions which can be used to convey collection level
descriptions. Usage of OAI sets and set descriptions varies
considerably among data providers. Service providers are using
collections defined by content providers in a multiplicity of
ways: to build registries; for filtering results; and for ranking
of item level search results.
However, harvesters find useful not
only information about the collection of resources which is
represented by the metadata in the OAI set, but also information
about the collection of metadata records. The distinction between
these two is oftentimes fuzzy. This presentation will present an
analysis of current practice in the OAI domain of set and set
description usage and will include the experiences of both a data
provider (Library of Congress) and a service provider (UIUC) on
the challenges of defining and describing sets (collections) of
items in a metadata sharing framework.
Session 6: Dynamic Digital Environments (Citadel I and II)
A) The Evolution of a Digitization Program from Project Based to Large Scale at the University of Texas at Austin Libraries.
Aaron Choate, University of Texas at Austin
As The University of Texas Libraries
continues to build and collaborate on large projects such as
UTOPIA, the Texas Heritage Digitization Initiative, and the Texas
Digital Library, it remains a challenge to also manage ongoing
internal digital projects workflows. Aaron Choate and Uri
Kolodney (Digital Library Production Services, UT Libraries) will
discuss the challenges their unit faces in managing parallel
project-based and production workflows as well as how such
projects touch on the management of resources throughout the
library.
B) DAR: A Digital Assets
Repository for Library Collections.
Mohamed Yakout,
Bibliotheca Alexandrina [presentation]
The Digital Assets Repository (DAR)
is a system developed at the Bibliotheca Alexandrina, the Library
of Alexandria, to create and maintain the digital library
collections. DAR acts as a repository for all types of digital
material and provides public access to the digitized collections
through Web-based search and browsing facilities. DAR is also
concerned with the digitization of material already available in
the library or acquired from other research-related institutions.
A digitization laboratory was built for this purpose at the
Bibliotheca Alexandrina.
The system introduces a data model
capable of associating the metadata of different types of
resources with the content such that searching and retrieval can
be done efficiently. The data model is able to describe objects
in either MARC 21 standard, which is designed for textual
material or VRA core, which is widely used format for describing
images and multimedia. DAR integrates the digitization and OCR
process with the digital repository and introduces as much
automation as possible to minimize the human intervention in the
process. As far as we know, this is an exclusive feature of DAR.
The system is also concerned with the preservation and archiving
of the digitized output and provides access to the collection
through browsing and searching capabilities.
The goal of this project is building
a digital resource repository by supporting the creation, use,
and preservation of varieties of digital resources as well as the
development of management tools. These tools help the library to
preserve, manage and share digital assets. The system is based on
evolving standards for easy integration with Web-based
interoperable digital libraries
C) Contextualizing the Institutional Repository within Faculty Research.
Deborah Holmes-Wong, Janis Brown, and Sara Tompson, University of Southern California [presentation]
It's very expensive to build an
institutional repository that very few faculty members will use
willingly and potentially damaging to the relationship that
libraries have with their users to rely solely on mandates from
upper administration for faculty compliance with depository
requirements. Faced with this dilemma, librarians at the
University of Southern California conducted a needs assessment
prior to implementing any institutional repository software. We
had a short timeline for the assessment and no funding available.
We began by conducting a literature review on faculty needs in
relation to institutional repositories, and we followed that with
faculty interviews and later focus groups. In the process, we
were able to validate observations made by other researchers
about faculty, and their reasons for not using institutional
repositories and develop use cases and a requirements document
that will guide our development. We found that while Open Access
to preprints and post-prints is a laudable goal for an
institutional repository, for most faculty members even those
committed to the ideal of Open Access, it is extra work to
publish to an institutional repository. We will discuss an easily
reproducible methodology used to gather information from faculty
members that can be used to construct use cases and requirements.
We will also discuss the results and propose how we will reframe
the institutional repository requirements to make the repository
useful to more faculty members.
7:00 p.m. – 9:30 p.m. POSTERS (Mezzanine)
1) Digital Imaging at the
University of Texas at Austin. Aaron Choate, University of Texas, Austin
The University of Texas Libraries has
been working with Stokes Imaging to refine their digital camera
system (the CaptureStation) and workflow management tool for use
in a collections-focused digitization center. The goal has been
to take a highly accurate digital camera system and build a
flexible product that will allow for the hardware investment to
be leveraged to capture rare books, bulk bound books, negatives
and transparencies and large format materials. John Stokes
(Stokes Imaging) and Aaron Choate (Digital Library Production
Services, UT Libraries) will show the progress they have made and
discuss plans they have for further modifications to the
system.
2) WolfPack: A Distributed File
Conversion Framework. Christopher Kellen, Carnegie
Mellon
WolfPack is an open-source software
framework used to automate the processing and OCRing of scanned
images in parallel using a variety of off-the-shelf programs.
WolfPack is a (soon-to-be)
open-source software framework used to automate the processing
and OCRing of scanned images in parallel using a variety of
off-the-shelf programs.
In creating a digital library, a
variety of conversion programs need to operate on each scanned
image, however these programs often take considerable time and
often run on different software platforms. Manually running these
conversion programs is time-consuming and error prone. In order
to increase the throughput of our scanning center we sought to
automate this process of deriving files from the original scanned
image.
WolfPack solves this problem by
providing a framework which: analyzes the files one currently
has, determines which derived files are missing, gives these
required conversions to worker processes to work on, collects the
derived files from those worker processes, and stores the
completed work.
This distributed file conversion
framework allows one to automate the various file conversions on
different software platforms, perform the work in parallel, and
perform the conversions around the clock, therefore increasing
the overall throughput of the scanning center.
In the past year, WolfPack has been
used to process over a half million pages. The WolfPack source
code is being released under an open source license.
3) Navigating a Sea of Texts:
Topic Maps and the Poetry of Algernon Charles Swinbure. John
Walsh and Michelle Dalmau, Indiana University [image]
Topic Maps, including their XML
representation, XML Topic Maps (XTM), are powerful and flexible
metadata formats that have the potential to transform digital
resource interfaces and support new discovery mechanisms for
humanities data sources, such as large collections of TEI-encoded
literary texts. Proponents of topic maps assert that topic map
structures significantly improve information retrieval, but few
user-based investigations have been conducted to uncover how
humanities researchers and students truly benefit from the rich
and flexible conceptual relationships that comprise topic
maps.
The proposed poster will provide an
introduction to Topic Maps and how a collection of TEI-encoded
literary texts, specifically, the Swinburne Project http://swinburnearchive.indiana.edu,
benefit from the use of topic maps. The poster will also provide
an overview of the methodology used for the comparative usability
study that was designed to assess the strengths and weaknesses of
a topic map-driven interface versus a standard search interface.
The interfaces that were presented to users will be demonstrated
along with key findings from the usability study. Lastly, design
alternatives based on the usability findings will also be
presented.
The results of this study are
intended to move the discussion of topic maps in the digital
humanities beyond demonstrating the novel to providing evidence
of the impact of Topic Maps and their extension of existing
classificatory structures on the humanities researcher's
discovery experience. We hope to provide those who are
implementing topic maps or similar metadata structures in digital
humanities resources with design recommendations that will ensure
successful user interaction.
4) Implications of the Copyright
Office's Recommended Legislation for Orphan Works. Denise
Troll Covey, Carnegie Mellon University
This poster session will provide an
opportunity to explore how the proposed legislation might impact
the creation of digital libraries. Based on an analysis of the
comments submitted in response to the Federal Register Notice of
Inquiry, the transcripts from the public hearings, and the final
report and recommendations prepared by the Copyright Office, the
poster will highlight the issues and compromises relevant to
libraries and archives, termed “large-scale access
uses” in the final report, and invite discussion and
strategic thinking about how digital libraries might leverage the
suggested revision to Section 514, Limitations on Remedies,
should it be enacted into law.
7:00 p.m. – 9:30p.m. Reception (Mezzanine)
DAY TWO: Tuesday, April 11
8:00 a.m. – 9:00 a.m. Breakfast (Mezzanine)
9:00 a.m. – 10:30 a.m.
Session 7: Managing Digital Library Content (Driskill Ballroom)
A) Everything Old Is New Again:
Repurposing Collections at the University of Michigan Through
Print on Demand.
Terri Geitgey and Shana Kimball, University of Michigan [presentation]
Three years ago, the Scholarly
Publishing Office of the University of Michigan University
Library undertook development and stewardship of a
print-on-demand program, which offers low-cost, high quality
reprints of volumes from the university library's digital
library collections, namely Making of America, Historical Math,
and Michigan Technical Reports, as well as from the American
Council of Learned Societies History E-Book collection. The
program began very modestly, as a little cost-recovery service
operating “on the side” and growth has been relatively gradual and scalable. However, recent developments,
such as an arrangement with BookSurge to make our titles
available through Amazon, and the recent addition of our metadata
to Bowker's Books in Print, are forcing us to re-examine
our current methods. Many challenges present themselves as we
consider transitioning to a more formal, scalable, full-time
service.
This paper explores why the
University of Michigan University Library chose to develop this
program, how the Scholarly Publishing Office built the
print-on-demand program, and some of the challenges and rewards
of the project. We'll cover the advantages and
disadvantages of our methods, and chart new areas of growth and
development for the program. We'll also touch on how this
type of activity relates to the notion of “library as
publisher” and the idea of selling information. Our goal is
to encourage and enable other libraries to explore
print-on-demand as a way to repurpose digital text
collections.
B) The Next Mother Lode for
Large-scale Digitization?
John Mark Ockerbloom, University of
Pennsylvania [presentation]
Much of the publicity around recent
mass-digitization projects focuses on the millions of books they
promise to make freely readable online. Because of copyright, though, most of
the books provided in full will be of mainly historical interest.
But much of the richest historical text content is not in books
at all, but in the newspapers, magazines, newsletters, and
scholarly journals where events are reported firsthand, stories
and essays make their debut, research findings are announced and
critiqued, and issues of the day debated. Back runs of many of
these serials are available in major research institutions but
often in few other places. But they have the potential for much
more intensive use, by a much wider community, if they are
digitized and made generally accessible.
In this talk, we will discuss an
inventory we have conducted at Penn of periodicals copyright
renewals. We found that copyrights of the vast majority of
mid-20th-century American serials of historical interest were not
renewed to their fullest possible extent. The inventory reveals a
rich trove of copyright-free digitizable serial content from
major periodicals as late as the 1960s. Drawing on our experience
with this inventory's production and previous registry
development, we will also show how low-cost, scalable knowledge
bases could be built from this inventory to help libraries more
easily identify freely digitizable serial content, and
collaborate in making it digitally available to the world. Our
initial raw inventory can be found at http://onlinebooks.library.upenn.edu/cce/firstperiod.html
Session 8: Remodeling Digital Library Systems (Citadel I and II)
A) SRU: Version 1.2 and
Beyond.
Robert Sanderson, University of
Liverpool [presentation]
The SRU Implementors Group and Editorial Board met at the beginning of March in the Hague to formalise the changes needed for SRU and CQL 1.2. This presentation will report on those decisions to the wider digital library community, including the technical changes to the protocol and query language, but also a discussion of how these changes affect current implementations and people wishing to implement something but not sure why, where or how to start.
In particular, changes are expected to CQL to allow a sort specification to be carried along with the query and the last non-profilable feature (proximity) will be changed to allow community specified values.
The SRU request and response formats will be tidied up with some of the rough edges filed down. This will be the first real test of the versioning system designed between version 1.0 and 1.1.
The presentation will also report on the progression towards full standardisation of SRU (now a NISO registered standard) and some thoughts about what the future might bring for digital library interoperability with SRU compliant applications developed outside of our community.
B) Archiving Courseware
Websites to DSpace, Using Content Packaging Profiles and Web
Services.
William Reilly and Robert Wolfe,
Massachusetts Institute of Technology [presentation]
Standards-based development of new functionality for the DSpace
platform to expose Web Services that import and export “courseware”
Web sites is the study of an MIT iCampus project, CWSpace. This
presentation reviews these DSpace capabilities (nearing completion)
of: 1) the “Lightweight Network Interface” (LNI), aWebDAV-based
implementation of basic archive services (a SOAP interface is also
provided); 2) a plug-in architecture which permits the use of content
packager plug-ins (e.g. IMS-CP; METS) for both submission
(SIP) and dissemination (DIP); 3) crosswalk plug-ins to accept
descriptive metadata other than Dublin Core (e.g. MODS; LOM), to be
rendered to DSpace's native Qualified Dublin Core.
Key to much of this software development has been the creation of an application profile for the IMS Content Package, serving as a
specification to both the DSpace platform as content consumer, and to
the initial target content provider, MIT's OpenCourseWare (OCW).
The resulting courseware packages—based on a standard, shaped by
this profile—are designed to be interoperable with other
collaborative learning environments and tools (e.g. RELOAD; dotLRN
LORS; other).
Topics addressed in the presentation include issues faced in working with these content packaging standards for archiving complex digital
objects (Web sites); issues in rendering Web sites from within a
repository; issues in (future) development to ingest the newer
“logical” content packages (URLs rather than only local files); issues
concerning intellectual property and student privacy when working with
educational materials.
10:30 a.m. – 11:00 a.m. Break (Mezzanine)
11:00 a.m. – 12:30p.m.
Session 9: (Driskill Ballroom)
PANEL: Surfacing Consistent Topics
Across Aggregated Resource Collections.
David Newman, University of California, Irvine [presentation];
Martin Halbert, Emory University [presentation]; Kat Hagedorn, University of Michigan [presentation]; and
Bill Landis, California Digital Library [presentation]
Surfacing consistent topics across a
heterogeneous collection of information resources is a challenge
faced by many digital libraries. This is true both for
large-scale aggregation services, and for those seeking to
federate a more focused set of resources for a specific audience.
This session provides an overview of clustering and
classification strategies and research, and considers two specific
implementations as a means of engaging the audience in a
discussion of possibilities for automated or semi-automated
topical remediation and enhancement in digital library work.
Note: Four 15-minute
presentations, followed by discussion with the audience.
“Automated Subject Indexing of
Document Collections,” David Newman, UC, Irvine: Clustering
and classification techniques—that are well known in computer
science—have potentially valuable applications for digital
libraries. This presentation will provide an overview of these
techniques, and discuss the strengths and weaknesses of several
methods to topically organize and categorize a collection of text
documents. We will review several case studies including an
OAI-harvested collection where individual documents vary widely
in their length and content.
“Tools and Findings of the Emory Meta-Combine Project,” Martin Halbert, Emory University: The MetaCombine Project, http://www.metacombine.org has developed: 1) search techniques for combinations of OAI-PMH and Web resources, 2) semantic clustering and taxonomy assignment for metadata and content, and 3) frameworks for combining digital library components acting as a whole (hence the project name: MetaCombine). The project (funded by The Andrew W. Mellon Foundation) has developed twenty separate software modules as enhancements to the Heretrix Web crawler and other DL tools, and has evaluated these tools with cooperation from the Universities of Illinois and Michigan. This presentation will focus on the MetaCombine project's assesment of the effectiveness of several specific semantic clustering techniques for improving organization and access to bodies of metadata exposed via the OAI-PMH as well as Web resources. The project's researchers not only evaluated existing techniques, but also developed a new mathematical algorithm (and associated software) for clustering termed non-negative matrix factorization, which is more efficient than other techniques for clustering metadata records.
“How (Not) to Use a
Semi-automated Classification Tool,” Kat Hagedorn, University of
Michigan: Clustering services hold much promise for providing end
users with a more targeted way of navigating large aggregator
sites like OAIster, as well as more focused federations of
scholarly resources such as those envisioned for the collections
created in the context of the DLF/Aquifer initiative. This
presentation discusses successes and challenges in prototype use
of Emory University's MetaCombine NMF Document Clustering System
Web Service at the University of Michigan.
“Go Fish!: Experiments with
Topical Metadata Enhancement in the American West Project,”
Bill Landis, California Digital Library: The CDL experimented with topical clustering in
support of creating consistent metadata to drive a hierarchical
faceted browse interface for the harvested metadata collection
assembled for the American West Project. This presentation
reviews issues arising from the topical enhancement work done for
this project, speculates on a sustainable process design for
longer term use of this approach, and considers some scenarios
for topic enhancement work in academic digital libraries.
Session 10: Digital Archiving (Citadel I and II)
A) Video Preservation: The Truth Is Out There.
Rick Ochoa and Melitte Buchman,
New York University [presentation] [video]
The Hemispheric Institute Digital
Video Library is currently a two year collaboration between
NYU's Digital Library Team and the Hemispheric Institute of
Performance and Politics (HI), supported by a grant from the
Mellon Foundation. The HI mission is to provide an open resource
for those scholars, artists and activists working on the relation
between politics and performance in the Americas. To that end the
Digital Library is digitizing and preserving 250 hours of video
per year of original performances, lectures, and symposia.
In shaping a video preservation
strategy, we have encountered many technical challenges. As
curious as it may seem, however, our greatest difficulty in
digitizing video is the semantics of what is meant when
video and preservation are used together. As a
model for video preservation we've looked closely at our
digital imaging initiative and the attempt to ground the digital
image surrogates in authenticity. Ideally, the only perceptible
change is in the container format. We have adopted similar
approachs in grounding video materials, and have met with limited
success due to issues of cost and pragmatism.
Whereas video commercial restoration
implements procedures to produce masters that are often heavily
reworked surrogates; i.e preservation versus restoration, at NYU
we have developed specific practices to uphold the spirit of
grounding video assets, and have chosen to eschew restoration in
favor of preservation.
In this presentation we will talk
about specific benchmarks that we've developed, areas
we've been able to automate, ways that we've
differentiated acceptable intervention in the master and in the
derivative.
B) Automated Risk Assessment
for File Formats.
Hannah Frost and Nancy
Hoebelheinrich, Stanford University [presentation]
Stanford's participation in the
National Digital Information Infrastructure and Preservation's
(NDIIPP) Archive Ingest and Handling Test (AIHT) provided the
opportunity to automate a mechanism to query a digital object for
assessment of the preservability of its object class by scoring
reported technical characteristics against Stanford Digital
Repository (SDR) preservation policy. The SDR Team developed a
process, integrated into repository ingestion workflow, which
incorporates JHOVE and applies PREMIS. This presentation will
discuss the conceptual underpinnings, operational experiences,
and the potential seen for the file format preservation matrices
used to support SDR policy and services.
12:30p.m. – 2:30p.m. Break for Lunch [Individual choice]
1:30 p.m. – 2:30 p.m. POSTERS (Mezzanine)
1) Digital Imaging at the University
of Texas at Austin. Aaron Choate, University of Texas at Austin
The University of Texas Libraries has
been working with Stokes Imaging to refine their digital camera
system (the CaptureStation) and workflow management tool for use
in a collections-focused digitization center. The goal has been
to take a highly accurate digital camera system and build a
flexible product that will allow for the hardware investment to
be leveraged to capture rare books, bulk bound books, negatives
and transparencies and large format materials. John Stokes
(Stokes Imaging) and Aaron Choate (Digital Library Production
Services, UT Libraries) will show the progress they have made and
discuss plans they have for further modifications to the
system.
2) WolfPack: A Distributed File
Conversion Framework. Christopher Kellen, Carnegie
Mellon University
WolfPack is a (soon-to-be)
open-source software framework used to automate the processing
and OCRing of scanned images in parallel using a variety of
off-the-shelf programs.
In creating a digital library, a
variety of conversion programs need to operate on each scanned
image, however these programs often take considerable time and
often run on different software platforms. Manually running these
conversion programs is time-consuming and error prone. In order
to increase the throughput of our scanning center we sought to
automate this process of deriving files from the original scanned
image.
WolfPack solves this problem by
providing a framework which: analyzes the files one currently
has, determines which derived files are missing, gives these
required conversions to worker processes to work on, collects the
derived files from those worker processes, and stores the
completed work.
This distributed file conversion
framework allows one to automate the various file conversions on
different software platforms, perform the work in parallel, and
perform the conversions around the clock, therefore increasing
the overall throughput of the scanning center.
In the past year, WolfPack has been
used to process over a half million pages. The WolfPack source
code is being released under an open source license.
3) Navigating a Sea of Texts: Topic
Maps and the Poetry of Algernon Charles Swinbure. John Walsh
and Michelle Dalmau, Indiana University [image]
Topic Maps, including their XML
representation, XML Topic Maps (XTM), are powerful and flexible
metadata formats that have the potential to transform digital
resource interfaces and support new discovery mechanisms for
humanities data sources, such as large collections of TEI-encoded
literary texts. Proponents of topic maps assert that topic map
structures significantly improve information retrieval, but few
user-based investigations have been conducted to uncover how
humanities researchers and students truly benefit from the rich
and flexible conceptual relationships that comprise topic
maps.
The proposed poster will provide an
introduction to Topic Maps and how a collection of TEI-encoded
literary texts, specifically, the Swinburne Project http://swinburnearchive.indiana.edu,
benefit from the use of topic maps. The poster will also provide
an overview of the methodology used for the comparative usability
study that was designed to assess the strengths and weaknesses of
a topic map-driven interface versus a standard search interface.
The interfaces that were presented to users will be demonstrated
along with key findings from the usability study. Lastly, design
alternatives based on the usability findings will also be
presented.
The results of this study are
intended to move the discussion of topic maps in the digital
humanities beyond demonstrating the novel to providing evidence
of the impact of Topic Maps and their extension of existing
classificatory structures on the humanities researcher's
discovery experience. We hope to provide those who are
implementing topic maps or similar metadata structures in digital
humanities resources with design recommendations that will ensure
successful user interaction.
4) Implications of the Copyright
Office's Recommended Legislation for Orphan Works. Denise
Troll Covey, Carnegie Mellon University
This poster session will provide an
opportunity to explore how the proposed legislation might impact
the creation of digital libraries. Based on an analysis of the
comments submitted in response to the Federal Register Notice of
Inquiry, the transcripts from the public hearings, and the final
report and recommendations prepared by the Copyright Office, the
poster will highlight the issues and compromises relevant to
libraries and archives, termed “large-scale access
uses” in the final report, and invite discussion and
strategic thinking about how digital libraries might leverage the
suggested revision to Section 514, Limitations on Remedies,
should it be enacted into law.
2:30p.m. – 4:00 p.m.
Session 11:
DLFAquifer: Bringing Collections to
Light. (Driskill Ballroom)
Katherine Kott, DLF Aquifer Director;
Perry Willett and Kat Hagedorn, University of Michigan; Jon Dunn, Indiana University;
Thornton Staples, University of Virginia; and Thomas Habing, University of Illinois at Urbana-Champaign [presentation]
This panel will highlight DLF Aquifer
phase 1 accomplishments. Following a brief project status report,
the program will focus on two project deliverables:
- A DLF Aquifer portal of MODS OAI harvested records. The
University of Michigan is hosting metadata harvesting for DLF
Aquifer and will demonstrate the DLF Aquifer portal, which
experiments with the DLF MODS Implementation Guidelines for
Cultural Heritage Materials.
- “Asset action packages” to support a consistent
user experience and deeper level of interoperability across
collections and repositories. An asset action package is an
XML-defined set of actionable URIs for a digital resource that
delivers named, typed actions for that resource. Members of the
DLF Aquifer Technology/Architecture Working Group will
demonstrate the application of asset action packages to
aggregated image collections in an OAI service provider.
- A third outcome of the past year's work, the DLF MODS
Implementation Guidelines for Cultural Heritage Materials is
proposed as an interactive BOF session.
Session 12: Tools (Citadel I and II)
A) The XML Submission Tool: A
System for Managing Text Collections at Indiana
University.
Dazhi Jiao, Tamara Lopez, and Jenn
Riley, Indiana University [presentation]
XML-based schemes like EAD and the
TEI are attractive to organizations because they normalize the
key concepts in a domain using a structured syntax. Both
standards are document-centric, designed to be
created and read by humans, and characterized by a mixture of
highly structured elements with unstructured content. Because the
XML standard also mandates machine-readability, a perceived
benefit of using XML markup languages is system interoperability.
However, unlike data-centric XML used for transaction processing,
languages like the TEI and EAD are developed in an iterative
editorial process that involves analysis of source text and
encoding. The illusory nature of interoperability in such an
environment is clear: two valid instance documents can employ the
markup language and adhere to content standards in vastly
different ways. The flexibility and complexity inherent in using
mixed-content markup languages thus demands that digital
libraries proactively manage the document creation process. This
is necessary to ensure that encoding and content guidelines are
followed while meeting the descriptive needs of source texts and
the data model requirements of delivery and access systems. The
XML Submission Tool manages the production and workflow of
collections described using XML markup languages. Implemented
using open-source Java software and XML technologies, it allows
document creators to submit documents to collection specific
rule-based content review, to review descriptive metadata, and to
preview HTML delivery. In addition, the submission tool serves as
an editorial repository that can be integrated with production
systems and digital repositories.
B) The Archivists' Toolkit:
Streamlining Production and Standardizing Archival
Information.
[presentation]
Bradley Westbrook, University of California, San Diego;
Lee Mandell and Jason Varghese, New York University.
The Archivists' Toolkit is a multi-institution, multi-year project initially funded by the Digital Library Federation and subsequently by The Andrew W. Mellon Foundation. This project update will occur several weeks before the beta version of the AT application is scheduled to be released for testing to the project partner repositories. The project update will consist of an account of how the application specification has been modified as a result of public comment last fall, and it will describe the testing process planned for the application. A considerable portion of the presentation will be devoted to demonstrating a prototype of the application and several of its salient features such as ingest of legacy data, recording of archival resource information, and production of EAD encoded finding aids, METS encoded digital objects, and administrative reports. Substantial time will be allocated to questions from attendees.
4:00 p.m. – 4:15 p.m. Break (Mezzanine)
4:15 p.m. – 5:15 p.m. BIRDS OF A FEATHER 1
1) DLF Aquifer MODS Implementation Guidelines: Overview/Discussion of Comments and Changes (Driskill Ballroom)
Sarah L. Shreeves, University of Illinois at Urbana-Champaign; Laura Akerman, Emory University; John Chapman, University of Minnesota; Melanie Feltner-Reichert, University of Tennessee; Bill Landis, California Digital Library; David Reynolds, The Johns Hopkins University; Jenn Riley, Indiana University; Liz Milewicz, Emory University; and Gary Shawver, New York University
This BOF will span both BOF slots (4:15-5:15 and 5:25-6:25); attendees should feel free to drop in at anytime. Although there may be some
overlap between the two sessions, the first BOF will focus on an
overview of the comments received and a discussion of the changes made
to the guidelines. The second session will largely be devoted to an
open discussion of the best approach to a central question raised by
the guidelines and comments received: how and where to describe the
original analog object and its digital surrogate.
The Metadata Working Group of the DLF Aquifer Initiative has developed a set of implementation guidelines for the Metadata Object Description
Schema (MODS). The guidelines were developed to encourage creation of
rich, shareable metadata that is coherent and consistent, and, thus,
useful to aggregators and end users. The draft guidelines were widely
distributed for community input in December 2005; the comment process
ended in early February. Since then the Metadata Working Group has been
reviewing comments and making changes to the Guidelines. Members of the
Working Group will present an overview of the comments received and the
proposed changes to the guidelines, soliciting additional feedback from
members of the DLF community.
2) Global Identifier Resolution:
Developers' Forum. (Maximilian Room)
Tim DiLauro, Organizer, Johns Hopkins University, and John Kunze, Organizer, California Digital Library
This BOF is for anyone interested in possible follow-on activities and topics arising in the Developers' Forum panel from Session 1 on Monday. It concerns the automatic mapping of identifiers to information objects, known as resolution, which is complicated by the diversity of available identifier schemes, resolution technologies, and expected uses. Likely topics include exploring practical collaborations in generalized and/or centralized resolution services.
A long-standing challenge for digital
libraries is how to make resolution more stable and deterministic
for the information objects they steward. Unable to control other
providers' services, we struggle to make ongoing choices among
providers, their objects and identifiers—the “Their Stuff”
problem. Conversely, we also struggle to set up our own services
so as to provide the best resolution experience to our users—the “Our Stuff” problem.
For example, in the “Their Stuff”
category, a large amount of metadata (and more and more often,
actual content) is being aggregated and indexed based on both
proprietary and open harvesting protocols such as OAI-PMH.
Because of the potential to harvest non-URL-based identifiers
(e.g., URN:NBN, Handle) and the absence of a standard mechanism
that can resolve all (or even most) of them, it is generally
necessary to find a URL equivalent for each digital object in the
harvested metadata. This makes it difficult to do things such as
resolving to one of a number of copies, depending on which is
available at a given time.
Two possible approaches to solving
this and similar problems would be to generalize and/or
centralize resolution. Creating a more generalized mechanism
would make it easier to develop common practice—and common
code—across many content stores with many identifier types.
Developing a more centralized solution would obviate the need for
every system that operates on identifiers to implement its own
complete set of resolution services. These approaches might even
encourage new service models.
3) Electronic Records Archives:
Systems and Metadata Architectures. (Austin Room)
Quyen Nguyen and Dyung Le, U.S. National Archives and Records Administration
The Electronic Records Archives (ERA) system will be a future archives system in the digital object world. It will authentically preserve any type of electronic record, created by any entity in the Federal Government, and it will provide this electronic information anytime and anyplace to anyone with an interest and legal right to access it. Within such system whose main goals are to preserve and provide access to digital records over time, Metadata Management is a critical service.
In this paper, we will present typical use case scenarios of ERA that involve or require metadata management. These use cases will encompass the creation, retrieval, update, and deletion of metadata of digital records throughout the record life cycle. The ERA system has to meet multiple challenges. On one hand, ERA has to deal with challenges that are inherent to the digital object world; on the other hand, it has to fulfill the requirements posed by the business practices of the archival community in the context of NARA mission.
We also study different database management models (relational, object-relational, native XML) for the ERA metadata repositories. The study will focus on how these technologies can satisfy the systems engineering principles of ERA such as performance, scalability, availability, backup, and recovery.
Meeting the information retrieval needs of the diverse, and potentially huge, ERA user community, given the resource limitations of ERA is a serious challenge. We will discuss options being considered by NARA to meet this challenge. ERA is intended to exist for an essentially indefinite period of time, and its Service-Oriented Architecture provides the flexibility to evolve over time as technology changes, including changing out COTS products. There are no current or emerging standards (other than for metadata) governing the Enterprise Search arena. Hence there is a real danger of becoming locked into a particular Enterprise Search vendor's proprietary approach. The paper will discuss the related technical issues and possible mitigations.
Finally, since ERA architecture is based on Web services technologies, and it is meant to be used by NARA personnel, record managers at federal agencies, as well as the general public, appropriate security scheme based on user access roles has to be implemented in order to protect the integrity of record metadata.
4) Update of Activities of the DLF Services Framework Working Group. (Jim Hogg Parlor)
Geneva Henry, Rice University
The DLF Services Framework Working Group (SFWG) seeks to understand and model the research library in today's academic environment. Our mission is to develop a framework within which the services offered by libraries, both business logic and computer processes, can be understood in relation to other parts of the institutional and external information landscape. This framework will help research institutions plan wisely for providing the services needed to meet the current and emerging information needs of their constituents.
This Birds of a Feather session will provide an overview of the group's current work and the issues that have been identified to date. Approaches for creating the framework will be discussed, along with the methodologies under consideration for capturing the business logic and software for successful development of the framework. The group's preliminary white paper and presentation presented to the DFL Steering Committee in May 2005 (available at http://www.diglib.org/architectures/serviceframe/)provide an overview of the motivation for this work. Participants are encouraged to provide feedback and ideas that will contribute to the group's activities.
Creating a framework showing the abstraction of services that can be identified throughout digital libraries will allow a more holistic view of the information environment, facilitating better planning for incorporation of shared services, integration between, and interoperability among digital library systems and processes.
The SFWG is actively identifying existing similar efforts, such as the JISC e-Framework Initiative, that are currently underway so as to benefit from their work, avoid duplication of efforts, and leverage collaborative findings. Existing standards, policies and protocols for identifying and describing business processes are being examined so that an appropriate model can be adopted that will allow the services framework that is developed to be commonly understood when examined by a diverse group of readers. The research institutions that are the primary target audience for this work will be included in the research being undertaken so that they will have an opportunity to provide input on the way information resources and services are provided at their institutions. Since the goal is to provide a framework that can be implemented to ensure these needs are met, it is important that these organizations understand their current landscape and how the framework can assist in future planning. A full time researcher, the 2006 DLF Distinguished Fellow for the Services Framework Initiative, will lead the research, working with the established DLF Services Framework Working Group that was formed in 2004 and has been actively pursuing this work to date.
5:15 p.m. – 5:25 p.m. Break (Mezzanine)
5:25 p.m. – 6:25 p.m. BIRDS OF A FEATHER 2
1) DLF Aquifer MODS Implementation
Guidelines. (Driskill Ballroom)
Sarah L. Shreeves, University of Illinois at Urbana-Champaign; Laura Akerman, Emory University; John Chapman, University of Minnesota; Melanie Feltner-Reichert, University of Tennessee; Bill Landis, California Digital Library; David Reynolds, The Johns Hopkins University; Jenn Riley, Indiana University; Liz Milewicz, Emory University; and Gary Shawver, New York University
This BOF will span both BOF slots (4:15-5:15 and 5:25-6:25); attendees should feel free to drop in at anytime. Although there may be some
overlap between the two sessions, the first BOF will focus on an
overview of the comments received and a discussion of the changes made
to the guidelines. The second session will largely be devoted to an
open discussion of the best approach to a central question raised by
the guidelines and comments received: how and where to describe the
original analog object and its digital surrogate.
The comment period for the DLF Aquifer Metadata Working Group's draft implementation guidelines for MODS ended in early February.
Commenters on the draft guidelines raised a couple of basic
philosophical questions, focusing on how and where to describe the
original analog object and its digital surrogate. The Working Group
would like to discuss the different approaches recommended by
reviewers and engage our user community in a face-to-face conversation
about some of these questions, including the relationship between FRBR
and data structures such as MARC and MODS. This will be an interactive
program that will give both the Aquifer Metadata Working Group and
potential users of the MODS guidelines an opportunity to discuss in
real time the issues raised during the comment period.
2) Archivists' Toolkit. (Maximilian Room)
Bradley Westbrook, University of California, San Diego
The Archivists' Toolkit is a multi-institution, multi-year project initially funded by the Digital Library Federation and subsequently by The Andrew W. Mellon Foundation. A brief project update and a demonstration of the Archivists' Toolkit will be presented as part of session 12 of the DLF Spring Forum. This BOF will serve as a followup to that presentation and will provide the opportunity for DLF attendees to ask additional questions about the Archivists' Toolkit application and to discuss in greater detail with project team members some of the application's design features and functional areas being considered as additions to the application in subsequent development phases.
3) DLF Inter-institutional
Communication. (Austin Room)
Michael Pelikan, Pennsylvania State University, and David
Seaman, Digital Library Federation
The Newsletter saw a big surge in
submissions when we first breathed life back into it. People were
pleased with the switch to xhtml, and seemed to understand that
their submissions were feeding not only the Newsletter, but DLF
registries.
Since then, especially in the past
calendar year, the submission rate has fallen way off. I cannot
browbeat submissions out of colleagues who are busy doing the
very projects we'd all most like to hear about—indeed: when
they're ready (or sooner!) we'll hear about them at the
Forum!
I'd like to query an interested group
of attendees as to some of the following:
- What can DLF do to foster communication between its member
institutions?
- If the Newsletter is useful, what can we do to ease or
normalize its timely production?
- Along those lines, is it time for a few pilot experiments
either with authenticated blogs, or a DLF-hosted wiki with
authenticated editing access?
- Should we be pushing stuff out with RSS? If so, fine.
- Where will we get the content and who will feed it in?
- Shall we offer to edit or redact it?
- What, of any of this, will people use, buy in to, get
enthusiastic about, and will also, at the same time, give DLF the
data needed to keep its registries up to date?
4) Central
Repository for a DL How-to. (Jim Hogg Parlor)
Jewel Ward, University of Southern California and Barrie Howard, Digital Library Federation
Currently, digital library (DL) how-to information is spread out in a variety of locations online or in printed textbooks, and is often out of date. We believe there is a need for a central repository that contains current information on “how to build a digital library.” If one of the ideals of our profession is to provide access to information, the idealistic vision for this project would be “providing access to information about how to share information.”
We would like to discuss approaches to this topic, especially regarding what colleagues think is needed, what kind of information and content the site should contain, who the intended audience would be, and how this site could be created and maintained. The trick, as some have pointed out, is to create a how-to that is not so detailed it becomes useless, nor so high level that it provides little practical guidance.
The initial vision is for a publicly available Web site that covers the end-to-end building of a predefined range of digital library services from a workflow perspective. The envisioned audiences are low-resource, first- through third-world institutions around the globe that need a reference or starting point when employees are faced with, “how and where do I begin?” Another thought is that it could be a best practices portal site, as well as one that could be translated into other languages. We believe that a DL how-to site with useful content would be a nice complement to current open-source digital library software.
DAY THREE: Wednesday, April 12
8:00 a.m. – 9:00 a.m. Breakfast (Mezzanine)
9:00 a.m. – 10:30 a.m.
Session 13: Digital Library Services (Driskill Ballroom)
A) Recommendations and Ranking:
Experiments in Next Generation Library Catalogs.
Brian Tingle, California Digital Library. [presentation]
During the last decade, there have
been fundamental changes in the way that people find and use
information on the Internet. Google, Amazon, e-Bay and other
successful commercial services have introduced technical
approaches such as relevance ranking, personalization,
recommending and faceted browsing that have fundamentally
reshaped user expectations. Currently, search results from
library catalogs are not presented in a transparent or usefully-
ranked manner to the user, in stark contrast to Internet search
engines. Nor do library systems offer recommending and
personalization services that are very popular with users in e-
commerce settings. Recent Mellon Foundation-funded research by
the California Digital Library into how library catalogs can
offer such modern search features will be presented and
discussed.
B) Unbundling the ILS:
Deploying an E-commerce Catalog Search Solution.
Andrew Pace and Emily Lynema, North Carolina State University [presentation]
The explosive growth of the Internet and the accompanying achievements in searching technology have highlighted the weaknesses of traditional library catalogs in today's information environment. Search engines and e-commerce tools that specialize in finding and presenting useful search results have become popular alternatives for many patrons. In response, NCSU Libraries has unbundled keyword searching of the library catalog from the functionality provided by the back-office integrated system. This presentation will provide an overview of the local implementation process, including an environmental scan of the marketplace and an introduction to the commercial software chosen. A demonstration of the library's new catalog search will reveal advances in natural language searching, relevance ranking, result-set exploration, and response time, as well as new features like “true browsing” of the collection by the Library of Congress Classification scheme. The presenters will address the technical architecture and requirements for co-existence with the legacy catalog, as well as future plans (including a FRBR-like record display), usability testing, and assessment plans.
Session 14: Packaging and Performance (Citadel I and II)
A) The Music Encoding
Initiative (MEI).
Perry Roland, University of Virginia [presentation]
The ability to more easily create
richly and consistently encoded musical sources would support the
analysis and cross-comparison of musical data by enabling
activities such as building structured virtual annotated
compilations of various instantiations of a work, or contextual
searching and detailed data retrieval across indexed XML
representations. The Music Encoding Initiative (MEI) DTD is a
developing standard for such work.
The purpose of MEI DTD is two fold:
to provide a standardized, universal XML encoding format for
music content (and its accompanying metadata) and to facilitate
interchange of the encoded data. MEI is not designed to be an
input code per se, like the Plaine and Easie code; however, it is
intended to be human-readable and easily understood and applied.
MEI has a significant advantage over other proposed XML standards
that define an entirely new terminology because it uses familiar
names for elements and attributes. Using common music notation
terminology has the benefit of making MEI files more
human-readable, and makes clear the correspondence between
MEI-encoded data and music notation. The true potential of MEI is
that a single file to encode multiple variations of a musical
work and generate multiple outputs. Because of its emphasis on
comprehensiveness, comprehensibility, and software independence,
MEI may also function as an archival data format.
The presentation will describe the
features of the MEI DTD and the advantages of its use as an
encoding standard. Methods for capturing data in MEI will be
discussed and a brief demonstration of displaying MEI data will
be given.
B) METS Profile Development at
the Library of Congress: An Update.
Morgan Cundiff, Library of
Congress [presentation]
The Library of Congress has continued
to develop METS Profiles for specific types of digital objects.
This presentation will feature recent development of profiles for
audio or video Recorded Events, Photographs, Historical
Newspapers, and Bibliographic Records. Explanation and
demonstration of these object types will be based on items in the
online application “Library of Congress Presents: Music,
Theater, and Dance”. Specific topics included will be: 1)
developing a consistent methodology for profile creation, 2)
using METS and MODS together to represent object structure, 3)
creating tools for validating METS documents (i.e. checking for
compliance to a given profile) and 4) moving toward METS
harvesting and interoperation. Discussion from the floor will be
welcomed.
C) Automated Generation of METS
Records for Digital Objects.
Nate Trail, Library of Congress [presentation]
This presentation will demonstrate a
loose set of configurable tools to generate METS objects from
files and metadata automatically. For Library of Congress
Presents: Music, Theater and Dance, we ingest files of digitized
content and merge them with metadata from various data sources to
build our METS objects. The demonstration will show conversion of
file system directory structure into XML documents, SRU searching
for bibliographic data, JDBC searching for rights and other item
specific data stored in common databases. The objects are then
indexed and stored for future rendering according to the METS
profile for that object.
We use open source applications and
tools (especially Cocoon and XSL) to interact with various data
components. For each type of digitized content, we may need to
interact with different databases for metadata, or expect to see
different file structures and file types, so the stylesheets and
Cocoon pipelines are broken into small steps that can be easily
re-used or modified. This enables us to more rapidly ingest
collections of digitized content according to METS profiles we
develop.
10:30 a.m. – 11:00 a.m. Break (Mezzanine)
11:00 a.m. – 12:30p.m.
Session 15: PANEL: The
Open Content Alliance, Introduction and Progress
Report. (Driskill Ballroom)
Rick Prelinger, the Internet Archive [presentation];
Robin Chandler, California Digital Library [presentation]; and Merrilee Proffitt, RLG [presentation]
In October 2005, the Internet Archive
announced a partnership of libraries and technology interests
including the University of California, the University of
Toronto, the European Archive, the National Archives (UK),
O'Reilly Media, Inc., Adobe, and Hewlett Packard Labs.
Shortly after, RLG, the Biodiversity Heritage Library, Emory
University, Johns Hopkins University Libraries, Rice University,
University of Texas, University of Virginia, and others joined
the newly formed Open Content Alliance. This unique partnership
of public and private seeks to digitize and make freely available
published, out of copyright material ... to any party.
This panel will discuss the formation of the OCA, principals,
working groups, and what the group intends to do in order to meet
a goal of having a mass of material on line and ready for use by
October 2006. The panel will allow for plenty of time for
audience discussion and input.
Session 16:
Panel:Listening to Users: How User Communities Can Inform
Design. (Citadel I and II)
Ellen Meltzer, Felicia Poe, and Tracy
Seneca, California Digital Library [presentation]
Outline of the panel:
- 1. Listening to users: Creating more useful digital library tools and services by understanding the needs of user communities.
- 2. The Calisphere Project: Supporting the use of university digital resources by multiple user communities.
- 3. The Web-at-Risk Project: Enabling curators to capture and manage collections of Web-published government and political information.
In order to create more useful digital library tools and services, we must first understand the needs of our user communities. In this panel discussion, we will describe what the California Digital Library has learned from carrying out an array of assessment activities with our current and potential users. Through the presentation of several projects in differing stages of development, we will share our growing insight into digital library user communities, including students, faculty, K-12 teachers, librarians, archivists and others. Panelists will explore the effective use of focus groups, interviews, surveys, and usability testing.
12:30 p.m. Adjourn
POST-CONFERENCE: Wednesday, April 12
12:45 p.m. – 1:45 p.m.
METS Community Meeting—open to all (Driskill Ballroom)
1:00 p.m. – 5:00 p.m.
Developers' Forum—open to all (Chisholm Trail Room)
Attendees are requested to please consider preparing an
informal 5-minute micro-presentation as described below.
Meeting Schedule
- 1:00–1:30. An update from Stephen Abrams about the GDFR project,
and some preliminary thoughts on a follow-on JHOVE
project to define its next-generation architecture.
- 1:30–3:30. Round table 5-minute micro-presentations on “coolest new
technology, most over-hyped technology, technical
problems calling for group discussion, and
opportunities for collaboration or standardization.”
- 3:30–4:00. Break with light snacks.
- 4:00–4:45. Discussion of topics for possible technical session and
BOF at the next main DLF Forum.
- 4:45–5:00. Select technical topic and next temporary co-chair, if
appropriate.
2:00 p.m. – 5:30 p.m.
METS Editorial Board Meeting—for participants only (Driskill Ballroom)
POST-CONFERENCE: Thursday, April 13
8:30 a.m. – 1:00 p.m.
METS Editorial Board Meeting (Austin Room)