random library quotation Link: Publications Forum Link: About DLF Link: News
Link: Digital Collections Link: Digital Production Link: Digital Preservation Link: Use, users, and user support Link: Build: Digital Library Architectures, Systems, and Tools
photo of books

DLF PARTNERS

""

DLF ALLIES

""

Comments

Please send the DLF Executive Director your comments or suggestions.

DIGITAL LIBRARY FEDERATION

SPRING FORUM 2006


AUSTIN, TX


APRIL 10 – 12, 2006


The Driskill Hotel
604 Brazos Street
Austin, Texas 78701
(800) 252-9367
Floor Plan

Austin skyline


PRECONFERENCE: Monday, April 10

8:30 a.m. – 11:30 a.m.

DLF Aquifer Metadata Working Group Meeting—for project participants (Governor's Boardroom)

DLF Aquifer Services Working Group Meeting—for project participants (Maximilian Room)

DLF Aquifer Technology/Architecture Working Group Meeting—for project participants (Chisholm Trail Room)

DAY ONE: Monday, April 10

10:30a.m. – 1:00 p.m. Registration (Mezzanine)

11:30a.m. – 12:15 p.m. First-time Attendee Orientation (Driskill Ballroom)

12:45 p.m. – 1:00 p.m. Opening Remarks (Driskill Ballroom)

1:00 p.m. – 2:30 p.m.

Session 1: PANEL: Developers' Forum Panel: Global Identifier Resolution. (Driskill Ballroom)

Tim DiLauro, Moderator, Johns Hopkins University; John Kunze, California Digital Library [presentation]; Eva Müller, Uppsala Universitet [Sweden][presentation]; and Herbert Van de Sompel, Los Alamos National Laboratory Research Library [presentation]

Digital object access via stable identifiers is an important problem for all digital libraries. The automatic mapping of identifiers to information objects, known as “resolution”, is complicated by the diversity of available identifier schemes, resolution technologies, and expected uses.

A long-standing challenge for digital libraries is how to make resolution more stable and deterministic for the information objects they steward. Unable to control other providers' services, we struggle to make ongoing choices among providers, their objects and identifiers—the “Their Stuff” problem. Conversely, we also struggle to set up our own services so as to provide the best resolution experience to our users—the “Our Stuff” problem.

For example, in the “Their Stuff” category, a large amount of metadata (and more and more often, actual content) is being aggregated and indexed based on both proprietary and open harvesting protocols such as OAI-PMH. Because of the potential to harvest non-URL-based identifiers (e.g., URN:NBN, Handle) and the absence of a standard mechanism that can resolve all (or even most) of them, it is generally necessary to find a URL equivalent for each digital object in the harvested metadata. This makes it difficult to do things such as resolving to one of a number of copies, depending on which is available at a given time.

Two possible approaches to solving this and similar problems would be to generalize and/or centralize resolution. Creating a more generalized mechanism would make it easier to develop common practice—and common code—across many content stores with many identifier types. Developing a more centralized solution would obviate the need for every system that operates on identifiers to implement its own complete set of resolution services. These approaches might even encourage new service models.

The speakers on this panel will discuss some new approaches to global identifier resolution. They will address such issues as generalized, scheme-agnostic mechanisms, resolving to different copies of an object, and persistence.

Session 2: PANEL: Implementing the PREMIS Data Dictionary. (Citadel I and II)

[joint presentation of Caplan and Guenther] Priscilla Caplan, Florida Center for Library Automation; Rebecca Guenther, Library of Congress; Nancy Hoebelheinrich, Stanford University [presentation]; and Marcus Enders, Niedersächsische Staats- und Universitätsbibliothek Göttingen [first presentation] [second presentation]

In May 2005, the PREMIS Working Group (Preservation Metadata: Implementation Strategies) released its Data Dictionary for Preservation Metadata, which defines and describes an implementable set of core preservation metadata with broad applicability to digital preservation repositories. In Nov. 2005, this international working group, comprised of 30 members from five countries, won the prestigious Digital Preservation Award, sponsored by the Digital Preservation Coalition and part of the UK Conservation Awards. This presentation/panel will discuss progress and problems in implementing the PREMIS data dictionary and some of the implementation choices to be made, with a particular focus on its use in METS. It will consist of a brief high level introduction to PREMIS and a panel discussion of two implementations and their similarities and differences.

  1. Introduction to PREMIS: Priscilla Caplan (Florida Center for Library Automation) Overview of PREMIS, its assumptions and its neutrality in terms of any particular implementation. Choices for implementation will be reviewed (i.e. using the PREMIS schema published on the MA site; incorporating pieces of the schema into METS; or, incorporating into another framework such as DIDL).
  2. Use of PREMIS with METS: A panel of three will discuss how the PREMIS data elements might be incorporated into METS. Marcus Enders (Niedersächsische Staats- und Universitätsbibliothek Göttingen) will discuss the MathARC implementation. Nancy Hoebelheinrich (Stanford University) will present Stanford's implementation of PREMIS in METS. Rebecca Guenther (Library of Congress) will outline the general issues to be considered in implementing PREMIS in a METS context and review how the two applications have approached it similarly and differently. The panel will then discuss the various approaches and take questions.

2:30p.m. – 3:00 p.m. Break (Mezzanine)

3:00 p.m. – 4:30p.m.

Session 3: PANEL: Libraries and Publishing—Reports from the Field. (Driskill Ballroom)

Maria Bonn, University of Michigan [presentation]; David Millman, Columbia University [presentation]; Catherine Mitchell, California Digital Library [presentation]; and David Ruddy, Cornell University [presentation]

For several years, a number of DLF member libraries have been exploring active roles in the scholarly publishing domain. These efforts were sparked by shared concerns: increasing costs, diminishing access, loss of control of scholarly content, greater consolidation of commercial publishing—in general, an environment that appeared increasingly restrictive, expensive, and unsustainable.

As a challenge to prevailing publishing models, these libraries have been building tools and providing services in support of scholarly publishing, experimenting with alternative business models, modes of production, and technologies, in an effort to identify successful and sustainable scholarly publishing solutions. This session includes updates on these efforts and reports on recent projects.

Maria Bonn will reflect on the growing pains and growing gains—looking at the strategies Michigan has taken to scale up, their costs and benefits, and also considering the extent to which they can and should develop support for some of the traditional publisher functions that are outside current library realms of expertise.

David Millman will present on issues of interoperability at Columbia and the re-use of library materials in publications, for instruction, and for research.

Catherine Mitchell will present on the collaboration forged among the California Digital Library, University of California Press and Mark Twain Papers in exploiting the CDL's existing XTF infrastructure to create digital critical editions of all of Mark Twain's works. She will discuss specifically the kinds of editorial and infrastructure issues born of this collaboration and the project's promise of both delivering and informing scholarly work.

David Ruddy will report progress on a collaborative effort by Cornell University Library and the Penn State Libraries and Press to develop and distribute open source publishing software. DPubS, developed to support Project Euclid, Cornell's publishing initiative in mathematics, is a flexible and extensible publishing platform that will allow libraries to create alternative and affordable publishing opportunities for their communities and beyond.

Session 4: PANEL: The LC/NSF Digital Archiving and Long-term Preservation Research Program (Digarch): Results and Prospects. (Citadel I and II)

William LeFurgy, Library of Congress ; Ardys Kozbial, University of California, San Diego [presentation]; Margaret Hedstrom, University of Michigan ; and Michael Nelson, Old Dominion University [presentation]

The panel will provide a brief background about the program and reports from three of the 10 projects funded from the first round. Project reports will highlight preliminary findings that may be of broad interest to the digital preservation community. There will be discussion about how the projects relate to other Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP) initiatives. Plans will be outlined for a potential second round of Digarch projects, which again will be administered through the National Science Foundation.

William LeFurgy to discuss NDIIPP and Digarch overall; Ardys Kozbial to discuss the “Digital Preservation Lifecycle Management Building” Digarch project; Margaret Hedstrom to discuss “Incentives for Data Producers to Create ‘Archive-Ready' Data Sets” Digarch project; and Michael Nelson to discuss “Shared Infrastructure Preservation Models” Digarch project.

4:30p.m. – 5.00 p.m. Break (Mezzanine)


5.00 p.m. – 6.30p.m.

Session 5: Metadata Strategies (Driskill Ballroom)

A) Development and Testing of Schema for Expressing Copyright Status Information in Metadata: Recommendations of the Rights Management Framework Group, California Digital Library.

[presentation] Karen Coyle, California Digital Library, and Sharon Farb, University of California, Los Angeles

Current efforts to express intellectual property rights associated with digital materials have focused on access and usage permissions, but many important permissions are defined by an item's copyright status rather than by license or contract. These permissions are not included in existing rights expressions. Digital libraries hold and provide access to many items for which copyright status is the sole governor of use, and even for licensed materials copyright status is often an essential element for those wishing to make further use of a work.

The California Digital Library (CDL) is working on a rights framework that will include recommendations for metadata to express the copyright status of digital resources. This metadata should accompany digital materials and be offered to users to inform them of the copyright status and potential uses of the item. It also allows the depositor to clearly state what data about the copyright status is not known by the holding library or archive, and what data may be known but has not been provided. Because this copyright information is often unknown or scant, the metadata includes fields for contact information for the office or individual who can best advise on use and permissions for the object in question.

Early versions of this work have been presented at the NISO Workshop on Rights Expression and the Society of American Archivists meeting, both in 2005. CDL has now developed a first schema language for this metadata and is seeking partners to test the metadata in actual digital library settings.

B) Truth and Consequences, Texas: The University of Texas Libraries' Metadata Registry Project.

Alisha Little and Erik Grostic, University of Texas at Austin [presentation]

The University of Texas at Austin's Metadata Registry began as a research project in 2001, and morphed into a fast track development project in 2003. This presentation will take people through the entire development and implementation process for the University of Texas Libraries' Metadata Registry. It will include: the rationale behind developing in house from scratch, rather than utilizing or modifying an existing product; the decisions we made regarding the data model and the use of FRBR and Dublin Core; what we wanted the system to do vs. what it does do; the perils of developing a pilot using a pilot (java struts); how we use it and how it works for us; and future development goals and questions.

C) Sharing Resources by Collection: OAI Sets and Set Descriptions.

Muriel Foulonneau and Sarah L. Shreeves, University of Illinois at Urbana-Champaign, and Caroline Arms, Library of Congress [presentation]

Many institutions are sharing their digital resources using metadata-sharing frameworks such as OAI-PMH. They sometimes organize their resources into subsets, such as OAI sets, which may or may not correspond to a defined collection. As the DLF/NSDL Best Practices for OAI Data Provider Implementations and Shareable Metadata <http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?TableOfContents> and other research notes, clustering resources by collections contributes to improving metadata shareability because the collection can provide context to individual items aggregated. The OAI protocol allows the definition of metadata sets and set descriptions which can be used to convey collection level descriptions. Usage of OAI sets and set descriptions varies considerably among data providers. Service providers are using collections defined by content providers in a multiplicity of ways: to build registries; for filtering results; and for ranking of item level search results.

However, harvesters find useful not only information about the collection of resources which is represented by the metadata in the OAI set, but also information about the collection of metadata records. The distinction between these two is oftentimes fuzzy. This presentation will present an analysis of current practice in the OAI domain of set and set description usage and will include the experiences of both a data provider (Library of Congress) and a service provider (UIUC) on the challenges of defining and describing sets (collections) of items in a metadata sharing framework.

Session 6: Dynamic Digital Environments (Citadel I and II)

A) The Evolution of a Digitization Program from Project Based to Large Scale at the University of Texas at Austin Libraries.

Aaron Choate, University of Texas at Austin

As The University of Texas Libraries continues to build and collaborate on large projects such as UTOPIA, the Texas Heritage Digitization Initiative, and the Texas Digital Library, it remains a challenge to also manage ongoing internal digital projects workflows. Aaron Choate and Uri Kolodney (Digital Library Production Services, UT Libraries) will discuss the challenges their unit faces in managing parallel project-based and production workflows as well as how such projects touch on the management of resources throughout the library.

B) DAR: A Digital Assets Repository for Library Collections.

Mohamed Yakout, Bibliotheca Alexandrina [presentation]

The Digital Assets Repository (DAR) is a system developed at the Bibliotheca Alexandrina, the Library of Alexandria, to create and maintain the digital library collections. DAR acts as a repository for all types of digital material and provides public access to the digitized collections through Web-based search and browsing facilities. DAR is also concerned with the digitization of material already available in the library or acquired from other research-related institutions. A digitization laboratory was built for this purpose at the Bibliotheca Alexandrina.

The system introduces a data model capable of associating the metadata of different types of resources with the content such that searching and retrieval can be done efficiently. The data model is able to describe objects in either MARC 21 standard, which is designed for textual material or VRA core, which is widely used format for describing images and multimedia. DAR integrates the digitization and OCR process with the digital repository and introduces as much automation as possible to minimize the human intervention in the process. As far as we know, this is an exclusive feature of DAR. The system is also concerned with the preservation and archiving of the digitized output and provides access to the collection through browsing and searching capabilities.

The goal of this project is building a digital resource repository by supporting the creation, use, and preservation of varieties of digital resources as well as the development of management tools. These tools help the library to preserve, manage and share digital assets. The system is based on evolving standards for easy integration with Web-based interoperable digital libraries

C) Contextualizing the Institutional Repository within Faculty Research.

Deborah Holmes-Wong, Janis Brown, and Sara Tompson, University of Southern California [presentation]

It's very expensive to build an institutional repository that very few faculty members will use willingly and potentially damaging to the relationship that libraries have with their users to rely solely on mandates from upper administration for faculty compliance with depository requirements. Faced with this dilemma, librarians at the University of Southern California conducted a needs assessment prior to implementing any institutional repository software. We had a short timeline for the assessment and no funding available. We began by conducting a literature review on faculty needs in relation to institutional repositories, and we followed that with faculty interviews and later focus groups. In the process, we were able to validate observations made by other researchers about faculty, and their reasons for not using institutional repositories and develop use cases and a requirements document that will guide our development. We found that while Open Access to preprints and post-prints is a laudable goal for an institutional repository, for most faculty members even those committed to the ideal of Open Access, it is extra work to publish to an institutional repository. We will discuss an easily reproducible methodology used to gather information from faculty members that can be used to construct use cases and requirements. We will also discuss the results and propose how we will reframe the institutional repository requirements to make the repository useful to more faculty members.

7:00 p.m. – 9:30 p.m. POSTERS (Mezzanine)

1) Digital Imaging at the University of Texas at Austin. Aaron Choate, University of Texas, Austin

The University of Texas Libraries has been working with Stokes Imaging to refine their digital camera system (the CaptureStation) and workflow management tool for use in a collections-focused digitization center. The goal has been to take a highly accurate digital camera system and build a flexible product that will allow for the hardware investment to be leveraged to capture rare books, bulk bound books, negatives and transparencies and large format materials. John Stokes (Stokes Imaging) and Aaron Choate (Digital Library Production Services, UT Libraries) will show the progress they have made and discuss plans they have for further modifications to the system.

2) WolfPack: A Distributed File Conversion Framework. Christopher Kellen, Carnegie Mellon

WolfPack is an open-source software framework used to automate the processing and OCRing of scanned images in parallel using a variety of off-the-shelf programs.

WolfPack is a (soon-to-be) open-source software framework used to automate the processing and OCRing of scanned images in parallel using a variety of off-the-shelf programs.

In creating a digital library, a variety of conversion programs need to operate on each scanned image, however these programs often take considerable time and often run on different software platforms. Manually running these conversion programs is time-consuming and error prone. In order to increase the throughput of our scanning center we sought to automate this process of deriving files from the original scanned image.

WolfPack solves this problem by providing a framework which: analyzes the files one currently has, determines which derived files are missing, gives these required conversions to worker processes to work on, collects the derived files from those worker processes, and stores the completed work.

This distributed file conversion framework allows one to automate the various file conversions on different software platforms, perform the work in parallel, and perform the conversions around the clock, therefore increasing the overall throughput of the scanning center.

In the past year, WolfPack has been used to process over a half million pages. The WolfPack source code is being released under an open source license.

3) Navigating a Sea of Texts: Topic Maps and the Poetry of Algernon Charles Swinbure. John Walsh and Michelle Dalmau, Indiana University [image]

Topic Maps, including their XML representation, XML Topic Maps (XTM), are powerful and flexible metadata formats that have the potential to transform digital resource interfaces and support new discovery mechanisms for humanities data sources, such as large collections of TEI-encoded literary texts. Proponents of topic maps assert that topic map structures significantly improve information retrieval, but few user-based investigations have been conducted to uncover how humanities researchers and students truly benefit from the rich and flexible conceptual relationships that comprise topic maps.

The proposed poster will provide an introduction to Topic Maps and how a collection of TEI-encoded literary texts, specifically, the Swinburne Project http://swinburnearchive.indiana.edu, benefit from the use of topic maps. The poster will also provide an overview of the methodology used for the comparative usability study that was designed to assess the strengths and weaknesses of a topic map-driven interface versus a standard search interface. The interfaces that were presented to users will be demonstrated along with key findings from the usability study. Lastly, design alternatives based on the usability findings will also be presented.

The results of this study are intended to move the discussion of topic maps in the digital humanities beyond demonstrating the novel to providing evidence of the impact of Topic Maps and their extension of existing classificatory structures on the humanities researcher's discovery experience. We hope to provide those who are implementing topic maps or similar metadata structures in digital humanities resources with design recommendations that will ensure successful user interaction.

4) Implications of the Copyright Office's Recommended Legislation for Orphan Works. Denise Troll Covey, Carnegie Mellon University

This poster session will provide an opportunity to explore how the proposed legislation might impact the creation of digital libraries. Based on an analysis of the comments submitted in response to the Federal Register Notice of Inquiry, the transcripts from the public hearings, and the final report and recommendations prepared by the Copyright Office, the poster will highlight the issues and compromises relevant to libraries and archives, termed “large-scale access uses” in the final report, and invite discussion and strategic thinking about how digital libraries might leverage the suggested revision to Section 514, Limitations on Remedies, should it be enacted into law.

7:00 p.m. – 9:30p.m. Reception (Mezzanine)

DAY TWO: Tuesday, April 11

8:00 a.m. – 9:00 a.m. Breakfast (Mezzanine)

9:00 a.m. – 10:30 a.m.

Session 7: Managing Digital Library Content (Driskill Ballroom)

A) Everything Old Is New Again: Repurposing Collections at the University of Michigan Through Print on Demand.

Terri Geitgey and Shana Kimball, University of Michigan [presentation]

Three years ago, the Scholarly Publishing Office of the University of Michigan University Library undertook development and stewardship of a print-on-demand program, which offers low-cost, high quality reprints of volumes from the university library's digital library collections, namely Making of America, Historical Math, and Michigan Technical Reports, as well as from the American Council of Learned Societies History E-Book collection. The program began very modestly, as a little cost-recovery service operating “on the side” and growth has been relatively gradual and scalable. However, recent developments, such as an arrangement with BookSurge to make our titles available through Amazon, and the recent addition of our metadata to Bowker's Books in Print, are forcing us to re-examine our current methods. Many challenges present themselves as we consider transitioning to a more formal, scalable, full-time service.

This paper explores why the University of Michigan University Library chose to develop this program, how the Scholarly Publishing Office built the print-on-demand program, and some of the challenges and rewards of the project. We'll cover the advantages and disadvantages of our methods, and chart new areas of growth and development for the program. We'll also touch on how this type of activity relates to the notion of “library as publisher” and the idea of selling information. Our goal is to encourage and enable other libraries to explore print-on-demand as a way to repurpose digital text collections.

B) The Next Mother Lode for Large-scale Digitization?

John Mark Ockerbloom, University of Pennsylvania [presentation]

Much of the publicity around recent mass-digitization projects focuses on the millions of books they promise to make freely readable online. Because of copyright, though, most of the books provided in full will be of mainly historical interest. But much of the richest historical text content is not in books at all, but in the newspapers, magazines, newsletters, and scholarly journals where events are reported firsthand, stories and essays make their debut, research findings are announced and critiqued, and issues of the day debated. Back runs of many of these serials are available in major research institutions but often in few other places. But they have the potential for much more intensive use, by a much wider community, if they are digitized and made generally accessible.

In this talk, we will discuss an inventory we have conducted at Penn of periodicals copyright renewals. We found that copyrights of the vast majority of mid-20th-century American serials of historical interest were not renewed to their fullest possible extent. The inventory reveals a rich trove of copyright-free digitizable serial content from major periodicals as late as the 1960s. Drawing on our experience with this inventory's production and previous registry development, we will also show how low-cost, scalable knowledge bases could be built from this inventory to help libraries more easily identify freely digitizable serial content, and collaborate in making it digitally available to the world. Our initial raw inventory can be found at http://onlinebooks.library.upenn.edu/cce/firstperiod.html

Session 8: Remodeling Digital Library Systems (Citadel I and II)

A) SRU: Version 1.2 and Beyond.

Robert Sanderson, University of Liverpool [presentation]

The SRU Implementors Group and Editorial Board met at the beginning of March in the Hague to formalise the changes needed for SRU and CQL 1.2. This presentation will report on those decisions to the wider digital library community, including the technical changes to the protocol and query language, but also a discussion of how these changes affect current implementations and people wishing to implement something but not sure why, where or how to start.

In particular, changes are expected to CQL to allow a sort specification to be carried along with the query and the last non-profilable feature (proximity) will be changed to allow community specified values.

The SRU request and response formats will be tidied up with some of the rough edges filed down. This will be the first real test of the versioning system designed between version 1.0 and 1.1.

The presentation will also report on the progression towards full standardisation of SRU (now a NISO registered standard) and some thoughts about what the future might bring for digital library interoperability with SRU compliant applications developed outside of our community.

B) Archiving Courseware Websites to DSpace, Using Content Packaging Profiles and Web Services.

William Reilly and Robert Wolfe, Massachusetts Institute of Technology [presentation]

Standards-based development of new functionality for the DSpace platform to expose Web Services that import and export “courseware” Web sites is the study of an MIT iCampus project, CWSpace. This presentation reviews these DSpace capabilities (nearing completion) of: 1) the “Lightweight Network Interface” (LNI), aWebDAV-based implementation of basic archive services (a SOAP interface is also provided); 2) a plug-in architecture which permits the use of content packager plug-ins (e.g. IMS-CP; METS) for both submission (SIP) and dissemination (DIP); 3) crosswalk plug-ins to accept descriptive metadata other than Dublin Core (e.g. MODS; LOM), to be rendered to DSpace's native Qualified Dublin Core.

Key to much of this software development has been the creation of an application profile for the IMS Content Package, serving as a specification to both the DSpace platform as content consumer, and to the initial target content provider, MIT's OpenCourseWare (OCW). The resulting courseware packages—based on a standard, shaped by this profile—are designed to be interoperable with other collaborative learning environments and tools (e.g. RELOAD; dotLRN LORS; other).

Topics addressed in the presentation include issues faced in working with these content packaging standards for archiving complex digital objects (Web sites); issues in rendering Web sites from within a repository; issues in (future) development to ingest the newer “logical” content packages (URLs rather than only local files); issues concerning intellectual property and student privacy when working with educational materials.

10:30 a.m. – 11:00 a.m. Break (Mezzanine)

11:00 a.m. – 12:30p.m.

Session 9: (Driskill Ballroom)

PANEL: Surfacing Consistent Topics Across Aggregated Resource Collections.

David Newman, University of California, Irvine [presentation]; Martin Halbert, Emory University [presentation]; Kat Hagedorn, University of Michigan [presentation]; and Bill Landis, California Digital Library [presentation]

Surfacing consistent topics across a heterogeneous collection of information resources is a challenge faced by many digital libraries. This is true both for large-scale aggregation services, and for those seeking to federate a more focused set of resources for a specific audience. This session provides an overview of clustering and classification strategies and research, and considers two specific implementations as a means of engaging the audience in a discussion of possibilities for automated or semi-automated topical remediation and enhancement in digital library work.

Note: Four 15-minute presentations, followed by discussion with the audience.

“Automated Subject Indexing of Document Collections,” David Newman, UC, Irvine: Clustering and classification techniques—that are well known in computer science—have potentially valuable applications for digital libraries. This presentation will provide an overview of these techniques, and discuss the strengths and weaknesses of several methods to topically organize and categorize a collection of text documents. We will review several case studies including an OAI-harvested collection where individual documents vary widely in their length and content.

“Tools and Findings of the Emory Meta-Combine Project,” Martin Halbert, Emory University: The MetaCombine Project, http://www.metacombine.org has developed: 1) search techniques for combinations of OAI-PMH and Web resources, 2) semantic clustering and taxonomy assignment for metadata and content, and 3) frameworks for combining digital library components acting as a whole (hence the project name: MetaCombine). The project (funded by The Andrew W. Mellon Foundation) has developed twenty separate software modules as enhancements to the Heretrix Web crawler and other DL tools, and has evaluated these tools with cooperation from the Universities of Illinois and Michigan. This presentation will focus on the MetaCombine project's assesment of the effectiveness of several specific semantic clustering techniques for improving organization and access to bodies of metadata exposed via the OAI-PMH as well as Web resources. The project's researchers not only evaluated existing techniques, but also developed a new mathematical algorithm (and associated software) for clustering termed non-negative matrix factorization, which is more efficient than other techniques for clustering metadata records.

“How (Not) to Use a Semi-automated Classification Tool,” Kat Hagedorn, University of Michigan: Clustering services hold much promise for providing end users with a more targeted way of navigating large aggregator sites like OAIster, as well as more focused federations of scholarly resources such as those envisioned for the collections created in the context of the DLF/Aquifer initiative. This presentation discusses successes and challenges in prototype use of Emory University's MetaCombine NMF Document Clustering System Web Service at the University of Michigan.

“Go Fish!: Experiments with Topical Metadata Enhancement in the American West Project,” Bill Landis, California Digital Library: The CDL experimented with topical clustering in support of creating consistent metadata to drive a hierarchical faceted browse interface for the harvested metadata collection assembled for the American West Project. This presentation reviews issues arising from the topical enhancement work done for this project, speculates on a sustainable process design for longer term use of this approach, and considers some scenarios for topic enhancement work in academic digital libraries.

Session 10: Digital Archiving (Citadel I and II)

A) Video Preservation: The Truth Is Out There.

Rick Ochoa and Melitte Buchman, New York University [presentation] [video]

The Hemispheric Institute Digital Video Library is currently a two year collaboration between NYU's Digital Library Team and the Hemispheric Institute of Performance and Politics (HI), supported by a grant from the Mellon Foundation. The HI mission is to provide an open resource for those scholars, artists and activists working on the relation between politics and performance in the Americas. To that end the Digital Library is digitizing and preserving 250 hours of video per year of original performances, lectures, and symposia.

In shaping a video preservation strategy, we have encountered many technical challenges. As curious as it may seem, however, our greatest difficulty in digitizing video is the semantics of what is meant when video and preservation are used together. As a model for video preservation we've looked closely at our digital imaging initiative and the attempt to ground the digital image surrogates in authenticity. Ideally, the only perceptible change is in the container format. We have adopted similar approachs in grounding video materials, and have met with limited success due to issues of cost and pragmatism.

Whereas video commercial restoration implements procedures to produce masters that are often heavily reworked surrogates; i.e preservation versus restoration, at NYU we have developed specific practices to uphold the spirit of grounding video assets, and have chosen to eschew restoration in favor of preservation.

In this presentation we will talk about specific benchmarks that we've developed, areas we've been able to automate, ways that we've differentiated acceptable intervention in the master and in the derivative.

B) Automated Risk Assessment for File Formats.

Hannah Frost and Nancy Hoebelheinrich, Stanford University [presentation]

Stanford's participation in the National Digital Information Infrastructure and Preservation's (NDIIPP) Archive Ingest and Handling Test (AIHT) provided the opportunity to automate a mechanism to query a digital object for assessment of the preservability of its object class by scoring reported technical characteristics against Stanford Digital Repository (SDR) preservation policy. The SDR Team developed a process, integrated into repository ingestion workflow, which incorporates JHOVE and applies PREMIS. This presentation will discuss the conceptual underpinnings, operational experiences, and the potential seen for the file format preservation matrices used to support SDR policy and services.

12:30p.m. – 2:30p.m. Break for Lunch [Individual choice]

1:30 p.m. – 2:30 p.m. POSTERS (Mezzanine)

1) Digital Imaging at the University of Texas at Austin. Aaron Choate, University of Texas at Austin

The University of Texas Libraries has been working with Stokes Imaging to refine their digital camera system (the CaptureStation) and workflow management tool for use in a collections-focused digitization center. The goal has been to take a highly accurate digital camera system and build a flexible product that will allow for the hardware investment to be leveraged to capture rare books, bulk bound books, negatives and transparencies and large format materials. John Stokes (Stokes Imaging) and Aaron Choate (Digital Library Production Services, UT Libraries) will show the progress they have made and discuss plans they have for further modifications to the system.

2) WolfPack: A Distributed File Conversion Framework. Christopher Kellen, Carnegie Mellon University

WolfPack is a (soon-to-be) open-source software framework used to automate the processing and OCRing of scanned images in parallel using a variety of off-the-shelf programs.

In creating a digital library, a variety of conversion programs need to operate on each scanned image, however these programs often take considerable time and often run on different software platforms. Manually running these conversion programs is time-consuming and error prone. In order to increase the throughput of our scanning center we sought to automate this process of deriving files from the original scanned image.

WolfPack solves this problem by providing a framework which: analyzes the files one currently has, determines which derived files are missing, gives these required conversions to worker processes to work on, collects the derived files from those worker processes, and stores the completed work.

This distributed file conversion framework allows one to automate the various file conversions on different software platforms, perform the work in parallel, and perform the conversions around the clock, therefore increasing the overall throughput of the scanning center.

In the past year, WolfPack has been used to process over a half million pages. The WolfPack source code is being released under an open source license.

3) Navigating a Sea of Texts: Topic Maps and the Poetry of Algernon Charles Swinbure. John Walsh and Michelle Dalmau, Indiana University [image]

Topic Maps, including their XML representation, XML Topic Maps (XTM), are powerful and flexible metadata formats that have the potential to transform digital resource interfaces and support new discovery mechanisms for humanities data sources, such as large collections of TEI-encoded literary texts. Proponents of topic maps assert that topic map structures significantly improve information retrieval, but few user-based investigations have been conducted to uncover how humanities researchers and students truly benefit from the rich and flexible conceptual relationships that comprise topic maps.

The proposed poster will provide an introduction to Topic Maps and how a collection of TEI-encoded literary texts, specifically, the Swinburne Project http://swinburnearchive.indiana.edu, benefit from the use of topic maps. The poster will also provide an overview of the methodology used for the comparative usability study that was designed to assess the strengths and weaknesses of a topic map-driven interface versus a standard search interface. The interfaces that were presented to users will be demonstrated along with key findings from the usability study. Lastly, design alternatives based on the usability findings will also be presented.

The results of this study are intended to move the discussion of topic maps in the digital humanities beyond demonstrating the novel to providing evidence of the impact of Topic Maps and their extension of existing classificatory structures on the humanities researcher's discovery experience. We hope to provide those who are implementing topic maps or similar metadata structures in digital humanities resources with design recommendations that will ensure successful user interaction.

4) Implications of the Copyright Office's Recommended Legislation for Orphan Works. Denise Troll Covey, Carnegie Mellon University

This poster session will provide an opportunity to explore how the proposed legislation might impact the creation of digital libraries. Based on an analysis of the comments submitted in response to the Federal Register Notice of Inquiry, the transcripts from the public hearings, and the final report and recommendations prepared by the Copyright Office, the poster will highlight the issues and compromises relevant to libraries and archives, termed “large-scale access uses” in the final report, and invite discussion and strategic thinking about how digital libraries might leverage the suggested revision to Section 514, Limitations on Remedies, should it be enacted into law.

2:30p.m. – 4:00 p.m.

Session 11: DLFAquifer: Bringing Collections to Light. (Driskill Ballroom)

Katherine Kott, DLF Aquifer Director; Perry Willett and Kat Hagedorn, University of Michigan; Jon Dunn, Indiana University; Thornton Staples, University of Virginia; and Thomas Habing, University of Illinois at Urbana-Champaign [presentation]

This panel will highlight DLF Aquifer phase 1 accomplishments. Following a brief project status report, the program will focus on two project deliverables:

  1. A DLF Aquifer portal of MODS OAI harvested records. The University of Michigan is hosting metadata harvesting for DLF Aquifer and will demonstrate the DLF Aquifer portal, which experiments with the DLF MODS Implementation Guidelines for Cultural Heritage Materials.
  2. “Asset action packages” to support a consistent user experience and deeper level of interoperability across collections and repositories. An asset action package is an XML-defined set of actionable URIs for a digital resource that delivers named, typed actions for that resource. Members of the DLF Aquifer Technology/Architecture Working Group will demonstrate the application of asset action packages to aggregated image collections in an OAI service provider.
  3. A third outcome of the past year's work, the DLF MODS Implementation Guidelines for Cultural Heritage Materials is proposed as an interactive BOF session.

Session 12: Tools (Citadel I and II)

A) The XML Submission Tool: A System for Managing Text Collections at Indiana University.

Dazhi Jiao, Tamara Lopez, and Jenn Riley, Indiana University [presentation]

XML-based schemes like EAD and the TEI are attractive to organizations because they normalize the key concepts in a domain using a structured syntax. Both standards are document-centric, designed to be created and read by humans, and characterized by a mixture of highly structured elements with unstructured content. Because the XML standard also mandates machine-readability, a perceived benefit of using XML markup languages is system interoperability. However, unlike data-centric XML used for transaction processing, languages like the TEI and EAD are developed in an iterative editorial process that involves analysis of source text and encoding. The illusory nature of interoperability in such an environment is clear: two valid instance documents can employ the markup language and adhere to content standards in vastly different ways. The flexibility and complexity inherent in using mixed-content markup languages thus demands that digital libraries proactively manage the document creation process. This is necessary to ensure that encoding and content guidelines are followed while meeting the descriptive needs of source texts and the data model requirements of delivery and access systems. The XML Submission Tool manages the production and workflow of collections described using XML markup languages. Implemented using open-source Java software and XML technologies, it allows document creators to submit documents to collection specific rule-based content review, to review descriptive metadata, and to preview HTML delivery. In addition, the submission tool serves as an editorial repository that can be integrated with production systems and digital repositories.

B) The Archivists' Toolkit: Streamlining Production and Standardizing Archival Information.

[presentation] Bradley Westbrook, University of California, San Diego; Lee Mandell and Jason Varghese, New York University.

The Archivists' Toolkit is a multi-institution, multi-year project initially funded by the Digital Library Federation and subsequently by The Andrew W. Mellon Foundation. This project update will occur several weeks before the beta version of the AT application is scheduled to be released for testing to the project partner repositories. The project update will consist of an account of how the application specification has been modified as a result of public comment last fall, and it will describe the testing process planned for the application. A considerable portion of the presentation will be devoted to demonstrating a prototype of the application and several of its salient features such as ingest of legacy data, recording of archival resource information, and production of EAD encoded finding aids, METS encoded digital objects, and administrative reports. Substantial time will be allocated to questions from attendees.

4:00 p.m. – 4:15 p.m. Break (Mezzanine)


4:15 p.m. – 5:15 p.m. BIRDS OF A FEATHER 1

1) DLF Aquifer MODS Implementation Guidelines: Overview/Discussion of Comments and Changes (Driskill Ballroom)

Sarah L. Shreeves, University of Illinois at Urbana-Champaign; Laura Akerman, Emory University; John Chapman, University of Minnesota; Melanie Feltner-Reichert, University of Tennessee; Bill Landis, California Digital Library; David Reynolds, The Johns Hopkins University; Jenn Riley, Indiana University; Liz Milewicz, Emory University; and Gary Shawver, New York University

This BOF will span both BOF slots (4:15-5:15 and 5:25-6:25); attendees should feel free to drop in at anytime. Although there may be some overlap between the two sessions, the first BOF will focus on an overview of the comments received and a discussion of the changes made to the guidelines. The second session will largely be devoted to an open discussion of the best approach to a central question raised by the guidelines and comments received: how and where to describe the original analog object and its digital surrogate.

The Metadata Working Group of the DLF Aquifer Initiative has developed a set of implementation guidelines for the Metadata Object Description Schema (MODS). The guidelines were developed to encourage creation of rich, shareable metadata that is coherent and consistent, and, thus, useful to aggregators and end users. The draft guidelines were widely distributed for community input in December 2005; the comment process ended in early February. Since then the Metadata Working Group has been reviewing comments and making changes to the Guidelines. Members of the Working Group will present an overview of the comments received and the proposed changes to the guidelines, soliciting additional feedback from members of the DLF community.

2) Global Identifier Resolution: Developers' Forum. (Maximilian Room)

Tim DiLauro, Organizer, Johns Hopkins University, and John Kunze, Organizer, California Digital Library

This BOF is for anyone interested in possible follow-on activities and topics arising in the Developers' Forum panel from Session 1 on Monday. It concerns the automatic mapping of identifiers to information objects, known as resolution, which is complicated by the diversity of available identifier schemes, resolution technologies, and expected uses. Likely topics include exploring practical collaborations in generalized and/or centralized resolution services.

A long-standing challenge for digital libraries is how to make resolution more stable and deterministic for the information objects they steward. Unable to control other providers' services, we struggle to make ongoing choices among providers, their objects and identifiers—the “Their Stuff” problem. Conversely, we also struggle to set up our own services so as to provide the best resolution experience to our users—the “Our Stuff” problem.

For example, in the “Their Stuff” category, a large amount of metadata (and more and more often, actual content) is being aggregated and indexed based on both proprietary and open harvesting protocols such as OAI-PMH. Because of the potential to harvest non-URL-based identifiers (e.g., URN:NBN, Handle) and the absence of a standard mechanism that can resolve all (or even most) of them, it is generally necessary to find a URL equivalent for each digital object in the harvested metadata. This makes it difficult to do things such as resolving to one of a number of copies, depending on which is available at a given time.

Two possible approaches to solving this and similar problems would be to generalize and/or centralize resolution. Creating a more generalized mechanism would make it easier to develop common practice—and common code—across many content stores with many identifier types. Developing a more centralized solution would obviate the need for every system that operates on identifiers to implement its own complete set of resolution services. These approaches might even encourage new service models.

3) Electronic Records Archives: Systems and Metadata Architectures. (Austin Room)

Quyen Nguyen and Dyung Le, U.S. National Archives and Records Administration

The Electronic Records Archives (ERA) system will be a future archives system in the digital object world. It will authentically preserve any type of electronic record, created by any entity in the Federal Government, and it will provide this electronic information anytime and anyplace to anyone with an interest and legal right to access it. Within such system whose main goals are to preserve and provide access to digital records over time, Metadata Management is a critical service.

In this paper, we will present typical use case scenarios of ERA that involve or require metadata management. These use cases will encompass the creation, retrieval, update, and deletion of metadata of digital records throughout the record life cycle. The ERA system has to meet multiple challenges. On one hand, ERA has to deal with challenges that are inherent to the digital object world; on the other hand, it has to fulfill the requirements posed by the business practices of the archival community in the context of NARA mission.

We also study different database management models (relational, object-relational, native XML) for the ERA metadata repositories. The study will focus on how these technologies can satisfy the systems engineering principles of ERA such as performance, scalability, availability, backup, and recovery.

Meeting the information retrieval needs of the diverse, and potentially huge, ERA user community, given the resource limitations of ERA is a serious challenge. We will discuss options being considered by NARA to meet this challenge. ERA is intended to exist for an essentially indefinite period of time, and its Service-Oriented Architecture provides the flexibility to evolve over time as technology changes, including changing out COTS products. There are no current or emerging standards (other than for metadata) governing the Enterprise Search arena. Hence there is a real danger of becoming locked into a particular Enterprise Search vendor's proprietary approach. The paper will discuss the related technical issues and possible mitigations.

Finally, since ERA architecture is based on Web services technologies, and it is meant to be used by NARA personnel, record managers at federal agencies, as well as the general public, appropriate security scheme based on user access roles has to be implemented in order to protect the integrity of record metadata.

4) Update of Activities of the DLF Services Framework Working Group. (Jim Hogg Parlor)

Geneva Henry, Rice University

The DLF Services Framework Working Group (SFWG) seeks to understand and model the research library in today's academic environment. Our mission is to develop a framework within which the services offered by libraries, both business logic and computer processes, can be understood in relation to other parts of the institutional and external information landscape. This framework will help research institutions plan wisely for providing the services needed to meet the current and emerging information needs of their constituents.

This Birds of a Feather session will provide an overview of the group's current work and the issues that have been identified to date. Approaches for creating the framework will be discussed, along with the methodologies under consideration for capturing the business logic and software for successful development of the framework. The group's preliminary white paper and presentation presented to the DFL Steering Committee in May 2005 (available at http://www.diglib.org/architectures/serviceframe/)provide an overview of the motivation for this work. Participants are encouraged to provide feedback and ideas that will contribute to the group's activities.

Creating a framework showing the abstraction of services that can be identified throughout digital libraries will allow a more holistic view of the information environment, facilitating better planning for incorporation of shared services, integration between, and interoperability among digital library systems and processes.

The SFWG is actively identifying existing similar efforts, such as the JISC e-Framework Initiative, that are currently underway so as to benefit from their work, avoid duplication of efforts, and leverage collaborative findings. Existing standards, policies and protocols for identifying and describing business processes are being examined so that an appropriate model can be adopted that will allow the services framework that is developed to be commonly understood when examined by a diverse group of readers. The research institutions that are the primary target audience for this work will be included in the research being undertaken so that they will have an opportunity to provide input on the way information resources and services are provided at their institutions. Since the goal is to provide a framework that can be implemented to ensure these needs are met, it is important that these organizations understand their current landscape and how the framework can assist in future planning. A full time researcher, the 2006 DLF Distinguished Fellow for the Services Framework Initiative, will lead the research, working with the established DLF Services Framework Working Group that was formed in 2004 and has been actively pursuing this work to date.

5:15 p.m. – 5:25 p.m. Break (Mezzanine)

5:25 p.m. – 6:25 p.m. BIRDS OF A FEATHER 2

1) DLF Aquifer MODS Implementation Guidelines. (Driskill Ballroom)

Sarah L. Shreeves, University of Illinois at Urbana-Champaign; Laura Akerman, Emory University; John Chapman, University of Minnesota; Melanie Feltner-Reichert, University of Tennessee; Bill Landis, California Digital Library; David Reynolds, The Johns Hopkins University; Jenn Riley, Indiana University; Liz Milewicz, Emory University; and Gary Shawver, New York University

This BOF will span both BOF slots (4:15-5:15 and 5:25-6:25); attendees should feel free to drop in at anytime. Although there may be some overlap between the two sessions, the first BOF will focus on an overview of the comments received and a discussion of the changes made to the guidelines. The second session will largely be devoted to an open discussion of the best approach to a central question raised by the guidelines and comments received: how and where to describe the original analog object and its digital surrogate.

The comment period for the DLF Aquifer Metadata Working Group's draft implementation guidelines for MODS ended in early February. Commenters on the draft guidelines raised a couple of basic philosophical questions, focusing on how and where to describe the original analog object and its digital surrogate. The Working Group would like to discuss the different approaches recommended by reviewers and engage our user community in a face-to-face conversation about some of these questions, including the relationship between FRBR and data structures such as MARC and MODS. This will be an interactive program that will give both the Aquifer Metadata Working Group and potential users of the MODS guidelines an opportunity to discuss in real time the issues raised during the comment period.

2) Archivists' Toolkit. (Maximilian Room)

Bradley Westbrook, University of California, San Diego

The Archivists' Toolkit is a multi-institution, multi-year project initially funded by the Digital Library Federation and subsequently by The Andrew W. Mellon Foundation. A brief project update and a demonstration of the Archivists' Toolkit will be presented as part of session 12 of the DLF Spring Forum. This BOF will serve as a followup to that presentation and will provide the opportunity for DLF attendees to ask additional questions about the Archivists' Toolkit application and to discuss in greater detail with project team members some of the application's design features and functional areas being considered as additions to the application in subsequent development phases.

3) DLF Inter-institutional Communication. (Austin Room)

Michael Pelikan, Pennsylvania State University, and David Seaman, Digital Library Federation

The Newsletter saw a big surge in submissions when we first breathed life back into it. People were pleased with the switch to xhtml, and seemed to understand that their submissions were feeding not only the Newsletter, but DLF registries.

Since then, especially in the past calendar year, the submission rate has fallen way off. I cannot browbeat submissions out of colleagues who are busy doing the very projects we'd all most like to hear about—indeed: when they're ready (or sooner!) we'll hear about them at the Forum!

I'd like to query an interested group of attendees as to some of the following:

  • What can DLF do to foster communication between its member institutions?
  • If the Newsletter is useful, what can we do to ease or normalize its timely production?
  • Along those lines, is it time for a few pilot experiments either with authenticated blogs, or a DLF-hosted wiki with authenticated editing access?
  • Should we be pushing stuff out with RSS? If so, fine.
  • Where will we get the content and who will feed it in?
  • Shall we offer to edit or redact it?
  • What, of any of this, will people use, buy in to, get enthusiastic about, and will also, at the same time, give DLF the data needed to keep its registries up to date?

4) Central Repository for a DL How-to. (Jim Hogg Parlor)

Jewel Ward, University of Southern California and Barrie Howard, Digital Library Federation

Currently, digital library (DL) how-to information is spread out in a variety of locations online or in printed textbooks, and is often out of date. We believe there is a need for a central repository that contains current information on “how to build a digital library.” If one of the ideals of our profession is to provide access to information, the idealistic vision for this project would be “providing access to information about how to share information.”

We would like to discuss approaches to this topic, especially regarding what colleagues think is needed, what kind of information and content the site should contain, who the intended audience would be, and how this site could be created and maintained. The trick, as some have pointed out, is to create a how-to that is not so detailed it becomes useless, nor so high level that it provides little practical guidance.

The initial vision is for a publicly available Web site that covers the end-to-end building of a predefined range of digital library services from a workflow perspective. The envisioned audiences are low-resource, first- through third-world institutions around the globe that need a reference or starting point when employees are faced with, “how and where do I begin?” Another thought is that it could be a best practices portal site, as well as one that could be translated into other languages. We believe that a DL how-to site with useful content would be a nice complement to current open-source digital library software.

DAY THREE: Wednesday, April 12

8:00 a.m. – 9:00 a.m. Breakfast (Mezzanine)

9:00 a.m. – 10:30 a.m.

Session 13: Digital Library Services (Driskill Ballroom)

A) Recommendations and Ranking: Experiments in Next Generation Library Catalogs.

Brian Tingle, California Digital Library. [presentation]

During the last decade, there have been fundamental changes in the way that people find and use information on the Internet. Google, Amazon, e-Bay and other successful commercial services have introduced technical approaches such as relevance ranking, personalization, recommending and faceted browsing that have fundamentally reshaped user expectations. Currently, search results from library catalogs are not presented in a transparent or usefully- ranked manner to the user, in stark contrast to Internet search engines. Nor do library systems offer recommending and personalization services that are very popular with users in e- commerce settings. Recent Mellon Foundation-funded research by the California Digital Library into how library catalogs can offer such modern search features will be presented and discussed.

B) Unbundling the ILS: Deploying an E-commerce Catalog Search Solution.

Andrew Pace and Emily Lynema, North Carolina State University [presentation]

The explosive growth of the Internet and the accompanying achievements in searching technology have highlighted the weaknesses of traditional library catalogs in today's information environment. Search engines and e-commerce tools that specialize in finding and presenting useful search results have become popular alternatives for many patrons. In response, NCSU Libraries has unbundled keyword searching of the library catalog from the functionality provided by the back-office integrated system. This presentation will provide an overview of the local implementation process, including an environmental scan of the marketplace and an introduction to the commercial software chosen. A demonstration of the library's new catalog search will reveal advances in natural language searching, relevance ranking, result-set exploration, and response time, as well as new features like “true browsing” of the collection by the Library of Congress Classification scheme. The presenters will address the technical architecture and requirements for co-existence with the legacy catalog, as well as future plans (including a FRBR-like record display), usability testing, and assessment plans.

Session 14: Packaging and Performance (Citadel I and II)

A) The Music Encoding Initiative (MEI).

Perry Roland, University of Virginia [presentation]

The ability to more easily create richly and consistently encoded musical sources would support the analysis and cross-comparison of musical data by enabling activities such as building structured virtual annotated compilations of various instantiations of a work, or contextual searching and detailed data retrieval across indexed XML representations. The Music Encoding Initiative (MEI) DTD is a developing standard for such work.

The purpose of MEI DTD is two fold: to provide a standardized, universal XML encoding format for music content (and its accompanying metadata) and to facilitate interchange of the encoded data. MEI is not designed to be an input code per se, like the Plaine and Easie code; however, it is intended to be human-readable and easily understood and applied. MEI has a significant advantage over other proposed XML standards that define an entirely new terminology because it uses familiar names for elements and attributes. Using common music notation terminology has the benefit of making MEI files more human-readable, and makes clear the correspondence between MEI-encoded data and music notation. The true potential of MEI is that a single file to encode multiple variations of a musical work and generate multiple outputs. Because of its emphasis on comprehensiveness, comprehensibility, and software independence, MEI may also function as an archival data format.

The presentation will describe the features of the MEI DTD and the advantages of its use as an encoding standard. Methods for capturing data in MEI will be discussed and a brief demonstration of displaying MEI data will be given.

B) METS Profile Development at the Library of Congress: An Update.

Morgan Cundiff, Library of Congress [presentation]

The Library of Congress has continued to develop METS Profiles for specific types of digital objects. This presentation will feature recent development of profiles for audio or video Recorded Events, Photographs, Historical Newspapers, and Bibliographic Records. Explanation and demonstration of these object types will be based on items in the online application “Library of Congress Presents: Music, Theater, and Dance”. Specific topics included will be: 1) developing a consistent methodology for profile creation, 2) using METS and MODS together to represent object structure, 3) creating tools for validating METS documents (i.e. checking for compliance to a given profile) and 4) moving toward METS harvesting and interoperation. Discussion from the floor will be welcomed.

C) Automated Generation of METS Records for Digital Objects.

Nate Trail, Library of Congress [presentation]

This presentation will demonstrate a loose set of configurable tools to generate METS objects from files and metadata automatically. For Library of Congress Presents: Music, Theater and Dance, we ingest files of digitized content and merge them with metadata from various data sources to build our METS objects. The demonstration will show conversion of file system directory structure into XML documents, SRU searching for bibliographic data, JDBC searching for rights and other item specific data stored in common databases. The objects are then indexed and stored for future rendering according to the METS profile for that object.

We use open source applications and tools (especially Cocoon and XSL) to interact with various data components. For each type of digitized content, we may need to interact with different databases for metadata, or expect to see different file structures and file types, so the stylesheets and Cocoon pipelines are broken into small steps that can be easily re-used or modified. This enables us to more rapidly ingest collections of digitized content according to METS profiles we develop.

10:30 a.m. – 11:00 a.m. Break (Mezzanine)

11:00 a.m. – 12:30p.m.

Session 15: PANEL: The Open Content Alliance, Introduction and Progress Report. (Driskill Ballroom)

Rick Prelinger, the Internet Archive [presentation]; Robin Chandler, California Digital Library [presentation]; and Merrilee Proffitt, RLG [presentation]

In October 2005, the Internet Archive announced a partnership of libraries and technology interests including the University of California, the University of Toronto, the European Archive, the National Archives (UK), O'Reilly Media, Inc., Adobe, and Hewlett Packard Labs. Shortly after, RLG, the Biodiversity Heritage Library, Emory University, Johns Hopkins University Libraries, Rice University, University of Texas, University of Virginia, and others joined the newly formed Open Content Alliance. This unique partnership of public and private seeks to digitize and make freely available published, out of copyright material ... to any party.

This panel will discuss the formation of the OCA, principals, working groups, and what the group intends to do in order to meet a goal of having a mass of material on line and ready for use by October 2006. The panel will allow for plenty of time for audience discussion and input.

Session 16: Panel:Listening to Users: How User Communities Can Inform Design. (Citadel I and II)

Ellen Meltzer, Felicia Poe, and Tracy Seneca, California Digital Library [presentation]

Outline of the panel:

  1. 1. Listening to users: Creating more useful digital library tools and services by understanding the needs of user communities.
  2. 2. The Calisphere Project: Supporting the use of university digital resources by multiple user communities.
  3. 3. The Web-at-Risk Project: Enabling curators to capture and manage collections of Web-published government and political information.

In order to create more useful digital library tools and services, we must first understand the needs of our user communities. In this panel discussion, we will describe what the California Digital Library has learned from carrying out an array of assessment activities with our current and potential users. Through the presentation of several projects in differing stages of development, we will share our growing insight into digital library user communities, including students, faculty, K-12 teachers, librarians, archivists and others. Panelists will explore the effective use of focus groups, interviews, surveys, and usability testing.

12:30 p.m. Adjourn

POST-CONFERENCE: Wednesday, April 12

12:45 p.m. – 1:45 p.m.

METS Community Meeting—open to all (Driskill Ballroom)

1:00 p.m. – 5:00 p.m.

Developers' Forum—open to all (Chisholm Trail Room)

Attendees are requested to please consider preparing an informal 5-minute micro-presentation as described below.

Meeting Schedule

  • 1:00–1:30. An update from Stephen Abrams about the GDFR project, and some preliminary thoughts on a follow-on JHOVE project to define its next-generation architecture.

  • 1:30–3:30. Round table 5-minute micro-presentations on “coolest new technology, most over-hyped technology, technical problems calling for group discussion, and opportunities for collaboration or standardization.”
  • 3:30–4:00. Break with light snacks.

  • 4:00–4:45. Discussion of topics for possible technical session and BOF at the next main DLF Forum.

  • 4:45–5:00. Select technical topic and next temporary co-chair, if appropriate.

2:00 p.m. – 5:30 p.m.

METS Editorial Board Meeting—for participants only (Driskill Ballroom)

POST-CONFERENCE: Thursday, April 13

8:30 a.m. – 1:00 p.m.

METS Editorial Board Meeting (Austin Room)

return to top >>