DLF Fall Forum 2008 Program

PRECONFERENCE: Monday, November 10

8:30 a.m. – 5:30 p.m.

METS Editorial Board (Washington, Third Floor)

DAY ONE: Wednesday, November 12

PRECONFERENCE 8:30 a.m. – 12:30 p.m.

Board of Trustees Meeting (Narragansett A, Ground Floor)

Developers' Roundtable (Bristol/Kent, Third Floor)

10:30 a.m. – 12:30 p.m.

Registration (Pre-function Area, Third Floor)

11:30 a.m. – 12:00 p.m.

First-time Attendee Orientation (Providence Ballroom, Third Floor) Barrie Howard, Digital Library Federation

Introduction 1:00 p.m. – 1:15 p.m.

Greetings and Welcome. (Providence Ballroom, Third Floor) DLF Executive Director Peter Brantley

Keynote Address 1 1:15 p.m. – 2:15 p.m.

(Providence Ballroom, Third Floor) Christopher Lydon, Radio Open Source

2:15 p.m. – 2:30 p.m.

Break (Pre-function Area, Third Floor)

Keynote Address 2: What does the future of literacy look like through the lens of open education? 2:30 p.m. – 3:30 p.m.

(Providence Ballroom, Third Floor) Ahrash Bissell [PRESENTATION], cc: learn The current explosion of digital media and online content is profoundly destabilizing traditional educational structures. Similarly, library operations and presumptions are under attack, and for many, especially among our youngest generations, the library seems a quaint institution, a throwback to the days when you had to actually go somewhere to find the things you want. Once you have Internet access, who needs a school or library when you have Google? And the Internet is literally in the air all around us, penetrating our workspaces, our homes, our phones, and hence our every waking moment more deeply every day. The ease with which you can access even highly detailed information is truly breathtaking compared to the burden of performing a similar task just 10 years ago. These same phenomena also give rise to an extraordinarily powerful and compelling vision for the future of education. Perhaps counterintuitively, these changes do not require us to overhaul our educational goals; rather, they force us to acknowledge and then embrace more fully those goals that have been at the core of the educational enterprise all along. Here, I will provide you with an outline of this vision, broadly termed "open education". And I will describe how open educational resources (OER) are proving to be the disruptive, yet constructive, elements that may lead to the ascendancy of the skills-based curriculum, breaking the grip of the content-based schemes that dominate the formal educational landscape. I will also show how formal educational institutions and libraries are dealing with these issues largely in parallel. The basic principles that are motivating the global open education movement translate nicely to the library context. Through the lens of open education, the future of literacy, as mediated by libraries, looks quite exciting indeed.

3:30 p.m. – 4:00 p.m.

Break (Pre-function Area, Third Floor)

4:00 p.m. – 5:30 p.m.

Session 1 (Providence I, Third Floor)

A) EVIA Digital Archive: A Visual Take on Digital Humanities Collaboration Alan R. Burdette and Jon W. Dunn [PRESENTATION], both Indiana University

The EVIA Digital Archive project is a joint effort of Indiana University and the University of Michigan, supported by the Andrew W. Mellon Foundation, to build a digital preservation and access system for ethnographic field video. Since its inception in 2001, the project has involved a collaborative team of ethnomusicology scholars, librarians, legal experts, and university administrators to design and develop policies, workflows, tools, and infrastructure. In addition to preserving and providing access to important source content contributed by scholars, the project brings these scholars together to create detailed, peer-reviewed annotations of the content, which is then used as metadata to enable scene-level search and access. Software developed by the project, including a video annotation tool and controlled vocabulary management tool, will be made available to the larger community as open source.

In this presentation, we will provide an overview of the project and its current status, and then discuss issues of collaboration, sustainability, and the project's significance for ethnomusicology and allied disciplines such as anthropology and folklore. We will also discuss EVIA's role in the new Institute for Digital Arts and Humanities at Indiana University, a collaborative support unit for innovative digitally-based arts and humanities research and creative activity.

B) A Busy Hive Creates Better WAX: Archiving the Web from many Perspectives Andrea Goethals and Wendy Gogel [PRESENTATION], both Harvard University

What is web archiving? To collection managers (librarians, archivists and faculty) it means collecting, preserving, and providing access to web content in the same way that they have always done for analog material. To lawyers it means entering a world of risk to be mitigated. To technologists (architects, programmers, graphic designers, preservationists) it means working with systems and content that is much more complicated then usual.

Harvard's Web Archive Collection Service (WAX) brought all of these players together in the last two years to design and implement a pilot web archiving service that takes into consideration all of these diverse viewpoints. Challenges include defining collections with amorphous boundaries, managing a multitude of complex IP and other legal issues, addressing QC for material too vast to review comprehensively, navigating crawler traps, an explosion of formats, and handling of duplicate content. Come hear the challenges encountered from multiple perspectives and see a demonstration of the WAX system.

Session 2 (Providence II, Third Floor)

A) PANEL: Content Transfer: Getting Data Moved Around the Network Michelle Gallinger and Leslie Johnston [PRESENTATION], both Library of Congress; Thomas Habing [PRESENTATION], University of Illinois at Urbana-Champaign; Steve Morris [PRESENTATION], North Carolina State University; Joe Pawletko [PRESENTATION], New York University The Library of Congress and network partners have been focusing on a program of digital content movement over the past year. This session will share the real-life results of that digital content transfer experience. The panel will feature specific successes in moving digital content among network partners as well as evolving "best practices" available for the community. The partners will share their experiences of transferring content via both network protocols (e.g., http, rsync, ftp) and tangible media (e.g., hard disk). The presentations will feature information on 1) tools and specifications developed as a result of this transfer experience, 2) how partners overcame obstacles in the transfer process, 3) workflow and packaging processes and 4) lessons learned.

B) Quest for Improved Discovery and Access for Shared Collections Susan Harum [PRESENTATION], DLF Aquifer; Deb Holmes-Wong [PRESENTATION], University of Southern California; Colin Koteles [PRESENTATION], College of Dupage

DLF has long supported community interest in improving access to digital library collections and has sponsored projects and initiatives to support this goal. Sharing digital collections through metadata aggregation is one path to providing a higher level of exposure. Both the Distributed Library: OAI for Digital Library Aggregation project (https://www.diglib.org/architectures/oai/imls2004/index.htm) and the Aquifer initiative have been focused on making digital content easier to find and use. This presentation will describe a data analysis project that bridges the Distributed Library and Aquifer projects and will describe assessment activities for Aquifer's American Social History Online Web site.

Colin Koteles will report the results of a data analysis project that examines metadata harvested for the DLF MODS portal (http://quod.lib.umich.edu/m/mods/)created through the Distributed Library project. The analysis examines conformance to the DLF/Aquifer Implementation Guidelines for Sharable MODS Records. Colin completed this study for a Certificate of Advanced Study in Digital Libraries at the Graduate School of Library and Information Science at UIUC.

Colin used existing OAI/PMH metadata from a sampling of DLF partners to test conformance to the DLF/Aquifer Implementation Guidelines for Shareable MODS Records. The results of this analysis are used to predict both the nature and extent of future normalization processes required by OAI service providers and the nature and extent of training and education required by OAI data providers hoping to expose MODS records that are useful in a variety of shared contexts. Results emphasize the need for application profile authors to leverage the automatic computing processes available to service providers and the unique, subjective knowledge data providers have of their collections.

Deb Holmes-Wong and Susan Harum will offer a brief status report on the American Social History Online Web site, including ongoing assessment activities. The past year has seen the implementation of a faceted browsing feature, a federated search feature, a doubling of collection items, and the optimization of the American Social History Online site for Google and Zotero. This presentation will: 1. Report assessment results for the implementation of the federated search feature 2. Describe preliminary results of assessment of the Web site itself 3. Describe what assessment efforts using Google analytics and web log analysis reveal about use of the website and 4. Outline assessment plans for the remainder of the project including a closer look at Zotero and Google and a connection to American Social History Online from within the Sakai course management system.

Session 3 (Providence III, Third Floor)

A) PANEL: Using WorldCat Grid Services in Library Applications Michael Durbin, Indiana University; Roy Tennant [PRESENTATION] and Diane Vizine-Goetz, both OCLC

OCLC's WorldCat Grid offers a wide array of bibliographic data and services via a suite of application program interfaces (APIs), including the ability to search and retrieve records from the massive WorldCat database and the ability to interact with several controlled vocabularies.

WorldCat Grid Services and the supporting OCLC Developer's Network will be introduced, and specific Grid services such as the WorldCat Search API and Terminology Services will be highlighted. Attendees will learn how they can take advantage of these services to enhance their web sites, services, or software applications.

Staff from Indiana University Digital Library Program (IU DLP) will demonstrate a prototype application that queries a subject vocabulary for graphic materials available from OCLC. IU DLP staff will discuss their experiences using OCLC's Terminology Services and outline future directions for tools and services that use networked vocabularies.

B) Enabling Inter Institutional Collaboration with Shibboleth Steven Carmody [PRESENTATION], Brown University

Current methods of access to Electronic Resources are far from ideal. Availability of library-licensed electronic resources addresses the need for convenience and the internet-acclimated expectation of ubiquitous availability of information. However, the problems that users encounter when accessing resources from off campus, as well as the machinations needed by library staff to manage access control and troubleshoot problems, raises the question of how to improve usability and manageability.

In 2007, the Library Shibboleth Pilot Project was started to investigate these issues, and explore how federated authentication could improve the user and library experiences. This session will address use cases, recommended technologies and proposed future directions, as well as identifying opportunities for international collaboration and real-world testing.

6:00 p.m. – 8:00 p.m.

Evening Reception (Narragansett A, Ground Floor)

DAY TWO: Thursday, November 13

8:00 a.m. – 9:00 a.m.

Breakfast (Waterplace Ballroom, Second Floor)

9:00 a.m. – 10:30 a.m.

Session 4 (Providence I, Third Floor)

A) A Scalable Approach to Providing Course-based Access to Library Resources Tito Sierra [PRESENTATION], Jason Casden, Kim Duckett, all North Carolina State University Academic libraries provide access to a wealth of information resources on their websites. Unfortunately, these resources are not typically organized around student needs. The NCSU Libraries Course Views project addresses this problem by dynamically generating library course pages for all courses taught at NCSU. The Course Views system employs a hybrid approach that balances auto-generated and customized content for courses while emphasizing scalability and sustainability. Standardized course descriptors are used to enable low-barrier integration with external systems such as campus learning management systems. The presentation will provide an overview of the application architecture, describe various content customization approaches, and provide an early glimpse on how the system has been used by students since its "beta" release in August 2008.

B) Deep Web Content and Internet Discovery: Exposing Harvard University's Library's Digital Resources to Search Engines Roberta Fox [PRESENTATION], Michael Vandermillen, and Spencer McEwen, all Harvard University

One of the challenges of being an early adopter of a new technology is addressing subsequent paradigm shifts as that technology matures. Database-driven applications that were developed under Harvard University's Library Digital Initiative (LDI) assumed a "catalog portal" model: users were expected go to the individual application's "home" page, or the Harvard University Library catalog, and execute a search from there to find the desired resources. The applications themselves relied heavily on cookies, sessions, and forms processing, all of which create barriers to search engine crawlers.

As students, faculty, and other users increasingly turn to search engines -- rather than library catalogs -- to meet their research needs, we reassessed our assumptions. The barriers to crawling which we had inadvertently created were keeping users from even knowing about the wealth of digital resources that our applications provide.

Not only did we need to modify the content of our pages to allow the deep web content to be crawled, additional modifications were needed both to make the search engine results presentation meaningful, and to place the pages reached by such a search within a greater context.

In this presentation, we discuss our re-engineering efforts to make our 400,000+ page-turned objects, high-quality images, and other digital objects, more accessible both to search engine crawlers and to those who alight on those pages after their search. We discuss our analysis of the trouble spots, our general technical approach, and specific solutions implemented.

C) The djatoka JPEG 2000 image server Ryan Chute [PRESENTATION]and Herbert Van de Sompel, both Los Alamos National Laboratory Research Library In this presentation, we introduce djatoka, a Java-based open source image server with an attractive basic feature set and extensibility under control of the community of implementers. Off-the-shelf, djatoka provides compression and region extraction of JPEG 2000 images, URI-addressability of regions, and support for a rich set of input/output image formats (e.g., BMP, GIF, JPG, PNG, PNM, TIF, JPEG 2000). djatoka also comes with a plug-in framework that allows transformations to be applied to regions and resolutions (e.g., watermarking). Our presentation will briefly discuss the strengths of the JPEG 2000 image file format, its application potential, and the Kakadu Software JPEG 2000 Library that is used at the core of djatoka. After an architectural overview of the djatoka solution, we will focus on the core djatoka API and the JPEG 2000 compression properties used to improve extraction performance and to provide visually lossless compression. We will show how the API can dynamically extract different resolutions, regions, and formats from a single JPEG 2000 image file, and how each of those is URI-addressable via an OpenURL. We end by discussing some of the possible image dissemination use cases illustrated by a demonstration.

Session 5 (Providence II, Third Floor)

A) Titles and Places and Names! Oh My! -- and Experimental Graph of a Library Catalog Jerry Persons [PRESENTATION], Stanford University

The structure of records in library catalogs has a long and fruitful history. Access points (remember color-coded card headings?) grew in number and sophistication as catalogs moved from drawers to computers. That trend continues with ongoing cycles of inventive means to index, process, and display the gist of each bibliographic record in ways that allow one to pursue the right stuff among tens of millions of volumes. And the quality of these records is truly remarkable. The accuracy and consistency of data elements represent untold hours of attention to carefully authenticating headings and precisely transcribing all manner of descriptive information.

Successful and productive as this infrastructure has been, it assumes a one-to-one relationship between a structured document (a catalog record) and a bibliographic entity. No matter how many titles by a single author (Shakespeare, Mozart), libraries build and maintain an individual document (record) in the catalog for every title. No matter how many representations of a single title (Macbeth, Don Giovanni) , we do the same, an individual document for every representation. It is no mean feat to maintain both accuracy and consistency across all the records in a local catalog, and the vagaries multiply when records from many libraries come together in union catalogs.

The experiment at Stanford (ongoing during fall quarter) explores the effects of taking catalog records apart to identify and then graph the relationships between unique access points. This in an effort to begin understanding what might be gained and what lost in an environment based on navigation amongst nodes in a graph that name the distinct titles, places, and names found in library catalogs. Rather than building from scratch, the work includes exploring use of the knowledge graphing tools built by Metaweb Technologies, Inc. and their content store found at http://www.freebase.com. Some of the concepts in play come from Styles, Ayers, and Shabir, Semantic MARC, MARC21 and the Semantic Web (Linked Data on the Web [LDOW2008]), http://events.linkeddata.org/ldow2008/.

The presentation aims to summarize the work and results to date as a means to stimulate discussion about -- this particular experiment, -- related efforts emerging in various library and academic venues, and -- the potential effects that aspects of various linked data initiatives might have on improving the visibility of library resources in wide web venues.

B) Digitized Text Services in a Library Catalog via a Service Oriented Approach Jonathan Rochkind [PRESENTATION], Johns Hopkins University Free services related to digitized texts are cropping up on the web, from Amazon 'search inside the book' or table of contents excerpts, to digitized books at Google or the Internet Archive. If an 'OpenURL link resolver' is considered as a system for providing services for known item citations, it is one rational place to house functionality for querying availability of such services from remote web sites. Such functionality was added to the open source Umlaut link resolver, provided directly in the library OPAC, and made available to be provided in the web pages of other library software via a via a light-weight 'service oriented software' philosophy. The local OPAC now includes a 'search inside' form, and links to online excerpts or full text, pre-checked for availability. Live demonstration of the software will be available. Issues of 'metadata matching' when integrating foreign third party bibliographic databases into local library online services will be discussed.

C) Metadata for You and Me: Current and Emerging Trends in Metadata and Content Sharing Jenn Riley, Indiana University; Sarah Shreeves [PRESENTATION], University of Illinois at Urbana-Champaign The Institute of Museum and Library Services-funded Metadata for You and Me training program, based on the National Science Digital Library/Digital Library Federation's Best Practices for Shareable Metadata, is designed to teach institutions wanting to share their metadata with aggregations such as OAIster how to optimize their metadata for this purpose. The program held its last workshop funded through the IMLS grant in August 2008. Nearly 300 metadata specialists, catalogers, and other staff from a variety of institutions (including museums, libraries, historical societies, archives, public broadcasters, and educational technology units) participated in the workshop. This presentation will highlight the interesting and often enlightening range of perspectives and attitudes on the sharing of both metadata and content that emerged in these workshops. We will discuss the ongoing evolution of the training program as the metadata sharing landscape evolved over the course of the project, adding in a focus on more open, web-based sharing mechanisms such as RSS, microformats, and Linked Data. We will also discuss practical strategies to assist smaller organizations in planning for making their content and metadata more shareable.

Session 6 (Providence III, Third Floor)

"Common" Goals: The Library of Congress Flickr Pilot Project Michelle Springer and Phil Michel [PRESENTATION], both Library of Congress

On January 16, 2008, the Library of Congress (LC) launched a pilot project, uploading approximately 3000 historic photographs to a Library account (http://www.flickr.com/photos/library_of_congress) on the popular photosharing Web site Flickr.com, and inviting the public to tag and describe them. This project significantly increased the reach of the Library and demonstrated what is possible when people can access and interact with that content within their own Web communities. The overture to new audiences was welcomed warmly, and total views of the photos now approach the 10 million mark. The Flickr community has rallied to our call for assistance with a surprising level of engagement, and the quality of "history detective" work has exceeded our expectations. Approximately 500 LC Prints and Photographs Online Catalog records have been enhanced with new information provided by the Flickr Community. In coordination with the Library's launch, Flickr introduced The Commons (www.flickr.com/commons), a specially designated Flickr space for publicly held photo archive collections with no known copyright restrictions. Institutions from 5 nations are currently participating within Flickr's Commons and inviting the public to contribute information.

In this session, Michelle Springer and Phil Michel will discuss all aspects of the evolution and implementation of the pilot: the collaboration with Flickr, technical aspects (including digital image and record preparation, customized applications development, and catalog updating), resources applied, user-generated content moderation, outcomes, and benefits. The concerns raised by the inevitable loss of control over our content once we place it outside the LC Web site is part of an ongoing conversation at the Library, for which this pilot provided practical experience and concrete data on Web 2.0 risks and rewards.

10:30 a.m. – 11:00 a.m.

Break (Pre-function Area, Third Floor)

11:00 a.m. – 12:30 p.m.

Session 7 (Providence I, Third Floor)

A) User Search Behaviors within a Library Portal William H. Mischo, Mary C. Schlembach, and David Vess, all University of Illinois at Urbana-Champaign

The University of Illinois at Urbana-Champaign Library introduced a gateway portal in September 2007 that serves as a front-end to numerous information resources. Many academic libraries have deployed these "single-entry search box" type gateways. The Illinois gateway is built around a locally developed metasearch recommender and discovery system.

While there has been a great deal of research on user behavior within web search engine environments, there has been little work, other than OPAC use studies, of user behaviors within a multi-function library portal. This paper will report on user search and clickthrough behaviors within the Illinois portal.

Our metasearch software offers a variety of assisted search techniques to improve user search strategy formulation and modification, and aid in directing users to the appropriate information resources. This includes providing spelling suggestions, limiting of results to phrase and title searches, author search prompts, direct resource links, and re-ordering of search results to match perceived user needs. The software suite is heavily instrumented to record user-entered search arguments, software changes in their search arguments (such as stopword removal), system suggested actions, and user actions in response to the assisted search functions. Many of the assisted search techniques were suggested by a detailed transaction log analysis of user search arguments. We have found that user information seeking behavior in our academic environment--with regard to types of queries, user modification behaviors, the number of terms per query, search session dynamics, and other factors--differs from the behaviors reported in the web search engine literature.

B) Usability and the Harvard Geospatial Library Bonnie Burns, David Siegel , Randy Stern [PRESENTATION], and Janet Taylor, all Harvard University The Harvard Geospatial Library (HGL) is a web-based discovery system and repository for geospatial data sets for use in research and teaching, including vector data, raster images of historic maps, satellite imagery, and more. First released in 2001, it was initially aimed at an audience of GIS experts, and the user interface for searching and mapping data was modeled on expert oriented GIS applications. With the explosion in general knowledge of mapping tools (Google maps, et al.), and the value of including geospatial analysis in research, a wider range of users are now looking to HGL to locate relevant data sets. In 2007, the Harvard libraries conducted a usability study of HGL, in response to which the user interface was completely re-designed with a number of interesting new features. This presentation will review the objectives, methodology, and outcomes of the usability study, demonstrate the new HGL user face with its innovative categorical and geographic browsing capabilities, and briefly touch on the technologies used to implement the new HGL. It will also review current and planned capabilities for interoperating directly with desktop GIS tools and with Google Maps.

C) The User Focused Implementation of YuFind (VuFind) at Yale Karen Kupiec [PRESENTATION]and Kathleen Bauer, both Yale University We will review the implementation of Vufind (called locally 'YuFind') at Yale. We will discuss 'Why Here, Why Now?', how the Usability & Assessment drove the project and our next steps.

Session 8: (Providence II, Third Floor)

A) The Role of Atom/AtomPub in Digital Archive Services at The University of Texas at Austin Peter Keane [PRESENTATION], University of Texas at Austin

The Digital Archive Services (DASe) project at UT Austin is a joint effort of the UT Libraries, the College of Liberal Arts, and the College of Fine Arts. DASe is a digital repository of images, audio, video, and pdf documents, and is used for managing, sharing, repurposing, and presenting digital media. Given the heterogeneous nature of both the assets themselves and the user populations, we have relied on the Atom Syndication Format and Atom Publishing Protocol as the primary data format for import, export, and storage.

The DASe application draws design principles from the Architecture of the World Wide Web (http://www.w3.org/TR/webarch/) and the principles of Representational State Transfer (REST) that underly it. DASe provides a uniquely open approach to metadata schemas ("Data First" as opposed to "Structure First", cf. http://www.betaversion.org/~stefano/linotype/news/93/), which allows minimal obstacles to adding new materials and the ability to import from and export to a wide range of metadata schemas, e.g, Atom, VRA Core, MODS, etc.

Our presentation will highlight the challenges, benefits, and outcomes of this approach. We will also discuss our investigations of OAI-ORE and the possible benefits that such a standard might offer our project. Special attention will be given to DASe not as an all-in-one solution, but rather DASe as an important player in the digital repository ecosystem that that includes DSpace, Fedora, EPrints, Google Base, Flickr, Metaweb, and others.

B) Smart Storage and Preservation: How Digital Repositories Can Participate Sally Rumsey [PRESENTATION], and Neil Jefferies, both Oxford University; Adrian Brown, The National Archives of the United Kingdom; Steve Hitchcock, University of Southampton Digital preservation is perceived as one of the most complex and daunting problems repository managers face. However, this seemingly monumental task becomes more manageable if it is broken down into its component factors, and responsibilities are shared and dispersed. This is the vision of the JISC-funded PRESERV2 project. By working with a new-generation repository architecture based on shared services and storage, the project seeks to demonstrate that preservation will not only be achievable, but could become a realistic part of day-to-day repository activities. The project incorporates current developments such as repositories as part of the data flow (as opposed to a pure data store), integration of multiple platforms and storage layers and decoupled services. It tackles the issue of scaleability by automated scheduling. By dealing separately with key items in the preservation flow - format identification, planning and risk assessment, and action (including migration) - modular services can address each component problem. This contrasts with the common view that preservation is a single problem to be carried out at one location. We refer to this new approach to repository preservation as smart storage, a whole-system approach based on a series of autonomous Web-based services. Smart storage combines underlying massive storage with the intelligence provided through the respective services. The paper will identify tools and services that can contribute towards smart storage, and show how repository managers can specify and select components for smart storage based on non-technical, policy-led considerations.

C) Comparative Performance Study of Digital Object Format Identification and Validation Tools Quyen Nguyen [PRESENTATION], U.S. National Archives and Records Administration

With the advent of information technology, it is imperative that we have a scheme to preserve digital born objects that were the products of office automation, geospatial images, and multimedia artifacts. As the first step of any preservation process, it is critical to be able to identify and validate the format of a digital object, and generate technical metadata embedded in that object. Some institutions have developed tools that can discover the format of a digital object beyond using file extensions.

In this paper, we studied and evaluated two tools, which are well-known in the domain of digital archiving and preservation: o JHOVE (JSTOR/Harvard Object Validation Environment), o DROID (Digital Record Object Identification).

The extent of capabilities against a same corpus of sample documents will be examined. We also will look at their respective performance, which is very important, due to the massive amount of digital objects, and practical deadlines of the ingest phase.

For types that are known to JHOVE, we found that there was no statistically significant difference of execution times between the two tools. But, with a mix of file types known and unknown to JHOVE, DROID seemed to fare better than JHOVE in terms of execution time. But JHOVE's processing generated richer metadata, which can be used as information in the asset catalog. Due to this duality, we would like to propose a software architecture which could leverage the benefits of JHOVE and DROID in terms of performance and metadata generation.

Session 9: (Providence III, Third Floor)

PANEL: Taking the Next Step with Digital Collections: Innovative Teaching, Learning & Research in Virtual Worlds

Instructions for viewing the transcript [TRANSCRIPT]:

Log on to the DLF wiki using the key: Entrop1a
Scroll to the bottom of the page and click on �Show All Pages,� the 2d link in the �Wiki Information� column
Click on the link for �Transcript: �Taking the Next Step with Digital Collections�
If the page is blank, scroll to the bottom of the page and click on �Enable Scripts� in the �Page Information� column

Leslie H. Jarmon, University of Texas at Austin; Jeremy Kemp, San Jose State University; Elizabeth McAulay, University of California, Los Angeles; Jeffrey Schnapp, Stanford University; Sheila Webber, University of Sheffield; Noah Wittman, University of California, Berkeley; Mansfield, University of Hertsfordshire; Xavier, Universidad Autonoma de Madrid; Esther Grassian, University of California, Los Angeles; Deni Wicklund, Stanford University

UC Berkeley students received a $5000 NMC 2008 Virtual Learning Prize to conduct research on representations of archaeological sites in virtual worlds. They are using Okapi Island to conduct this study, a Second Life (SL) reproduction of a Turkish archaeological site." Co-managed by Open Knowledge and the Public Interest (OKAPI) and the Berkeley Department of Anthropology, Okapi Island offers 65,000 square meters of virtual real estate for exploring new forms of research, education and public outreach.

The University of Sheffield's InfoLit iSchool focuses on information literacy research. The iSchool has held discussions in SL and sponsored tours of SL sites, including a "multiple intelligences" site based on Howard Gardner's work, and a computer science course site where students engage in inquiry-based learning by playing games at an SL carnival, conducting group projects, and personalizing SL condos.

SLoodle development continues, the SL version of Moodle (open source), with a walkthrough tutorial, introductory SLoodle classes, and a downloadable SL SLoodle toolbar already in place. A new controller is under development for version 0.3, which will allow instructors to customize their SLoodle sites for individual courses.

These are examples of virtual world endeavors. What are some innovative means of using digital collections to stimulate, support and enhance virtual world teaching and research projects like these? Come to a "mixed reality" session on this topic, with a real life (RL) audience at the DLF Forum, and panelists located in various countries and other parts of the U.S., most participating from within SL.

12:30 p.m. – 2:30 p.m.

Break for Lunch and Networking [Individual choice]

Bird-of-a-Feather Sessions 2:30 p.m. – 3:30 p.m.

A. JHOVE2 Needs Assessment and Functional Requirements (Providence I, Third Floor) Stephen Abrams [PRESENTATION], California Digital Library; Evan Owens, Portico; Tom Cramer, Stanford University The open source JHOVE characterization tool has proven to be an important component of many repository and preservation workflows. However, its widespread use over the past four years has revealed a number of limitations imposed by idiosyncrasies of design and implemenation. CDL, Portico, and Stanford University are now collaborating on a two-year NDIIPP-funded project to develop a next-generation JHOVE2 architecture. Among the enhancements planned for JHOVE2 are streamlined APIs; increased performance; a more sophisticated data model supporting complex multi-file objects and arbitrarily-nested container objects; a generic plug-in mechanism supporting stateful multi-module processing; and automated rules-based assessment. The project partners are currently engaged in an public needs assessment and requirements gathering phase. A provisional set of functional requirements has already been reviewed by the JHOVE2 advisory board and a European audience at the recent iPRES conference. We would now like to to take advantage of the DLF community to help us refine these requirements and prioritize our development efforts. We expect a lively and detailed discussion during this session. Participants are asked to closely review the functional requirements and other materials that will be made available on the project wiki at and the DLF-Announce mailing list prior to the session.

B. Developing A Community for the Djatoka JPEG 2000 Image Server (Providence II, Third Floor) Ryan Chute, Los Alamos National Laboratory Research Library Djatoka is a recently released Java-based open source API and image server with an attractive basic feature set and extensibility under control of the community of implementers. Off-the-shelf, djatoka provides compression and region extraction of JPEG 2000 images, URI-addressability of regions, and support for a rich set of input/output image formats (e.g., BMP, GIF, JPG, PNG, PNM, TIF, JPEG 2000). This session intends to gather use cases, requirements, and solicit feedback regarding the organization of a development community.

C. Re-imagining METS Profiles (Providence III, Third Floor) Jenn Riley [PRESENTATION], Indiana University; Brian Tingle, California Digital Library; Nancy Hoebelheinrich, Stanford University METS Profiles present a valuable mechanism for documenting local METS practice, or prescribing restrictions on the use of METS for a specific resource format or a specific software application. The METS Editorial Board is considering some changes to the METS Profile schema, on which we would like community feedback. A draft revised Schema will be available for review prior to the Forum, and a preliminary proposal for Profile revisions is available on the METS Wiki at http://www.socialtext.net/mim-2006/index.cgi?mets_profiles_discussion_page. Specific additions under consideration include: the addition of some XHTML elements into documentation sections, the ability to embed XML snippets within the Profile itself, a new "Document Model" section, and a new "Use Cases" section. At this BOF session, we will also gather feedback on design ideas for a web-based tool for creating METS Profiles, and discuss community needs for machine-actionable versions of METS Profiles.

D. ILS-Discovery Interface Implementations (Providence IV, Third Floor) John Mark Ockerbloom, University of Pennsylvania This session is an opportunity for developers of interfaces implementing the DLF's ILS-Discovery interface recommendations to present their work to others, ask and answer questions about the recommendations and their implementations, and discuss further development initiatives and coordination.

3:30 p.m. – 4:00 p.m.

Break (Pre-function Area, Third Floor)

4:00 p.m. – 5:30 p.m.

Session 10 (Providence I, Third Floor)

A) PANEL: Fedora Commons and DSpace Foundation collaboration Michele Kimpton, DSpace Foundation; Sandy Payette, Fedora Commons

During the summer of 2008 a number of discussions were held between Fedora Commons and DSpace Foundation about the possibilities for collaboration between the two non-profit organizations in ways that could best serve both communities and add value. Both communities were invited to participate and give input during the course of several virtual and face to face meetings.

Several short term projects have been defined and are actively being worked on to enable the two platforms to interoperate more seamlessly, to allow for joint web services to be built on top of either platform by defining common standards, and to work collaboratively with outside partners to define new services and solutions the community at large is interested in.

In the longer term, the Executive Directors (Payette and Kimpton) agreed that both organizations would benefit by looking beyond their existing systems to new joint opportunities that would create light weight, web based services and solutions. Payette and Kimpton realized that together they have an asset in the form of a sizable community of committed open source developers and users that can not only keep the existing open source products alive, but can be leveraged to produce new value in the competitive terrain in which we exist.

Both Payette and Kimpton will present in detail the projects and progress to date, and outline future activities to further the collaboration.

B) Defining "Noncommercial Use" Virginia Rutledge, Creative Commons

This presentation will report on a research study designed to explore differences between commercial and noncommercial uses of content, as those uses are understood among various communities and in connection with a wide variety of content. The study is being undertaken by the nonprofit organization Creative Commons, with the generous support of the Andrew W. Mellon Foundation.

Creative Commons provides free copyright licenses to creators who want to give the public certain permissions to use their works, in advance and without the need for one-to-one contact between the creator and the user. Since the licenses were introduced in 2002, they have been translated into 47 legal jurisdictions and adopted by content creators around the globe, from remix musicians to educator consortia, bloggers to book publishers. At present over 130 million works are available on the Internet under Creative Commons licenses, each work tagged with metadata expressing the rights and permissions associated with it. Almost 2/3 of all works CC-licensed works today include the "Noncommercial" or "NC" term. These works may be used by anyone for any purpose that is not "primarily intended for or directed toward commercial advantage or private monetary compensation", provided the use also complies with the other terms of the license.

As adoption of Creative Commons licenses spreads exponentially, the organization is undertaking this study as part of its efforts to provide information to creators about the contexts in which the NC term may further or impede their intentions with respect to the works they choose to share, and to make sure that users clearly understand those intentions. The study findings, which will be publicly released in 2009, are expected to help improve the licenses where possible, but also to contribute generally to better understanding of the complexities of digital distribution of content.

Session 11 (Providence II, Third Floor)

A) Access to Individual Harvested Sites in a Web Archive Tracy Meehleib [PRESENTATION], Library of Congress

As libraries assume responsibility for web archiving as part of their collection development, there is increasing pressure placed on them to capture, store, and provide access to this born-digital content. For the past four years the Library of Congress has been experimenting with and developing tools and methodologies to make website capture and access more efficient, while at the same time striving to ensure that these new resources are searchable and indexable along with other library collections.

The Library of Congress' workflow for creating resource descriptions of archived, combines automated techniques (metadata extraction from archived files) and human techniques (increased subject access enrichment by catalogers) to provide site-level access, facilitating searching and browsing within and across web archives. In addition, the intellectual/historical context is embodied within the archive itself and described in collection-level records and descriptive background to the content written by curators and subject specialists.

As a result, the Library of Congress Web Archive project has made significant progress towards creating a sustainable infrastructure for making web archives available for digital scholarship.

B) PANEL: Preserving Public Government Information: The 2008 End of Term Crawl Project Abbie Grotke [PRESENTATION], Library of Congress; Kris Carpenter, Internet Archive The Library of Congress, the California Digital Library, the University of North Texas Libraries, the Internet Archive and the U.S. Government Printing Office have joined together for a collaborative project to preserve public United States Government web sites at the end of the current presidential administration ending January 19, 2009. The panel will describe how the collaboration came about and the scope of the project, and will discuss how the partners and volunteer nominators -- government information specialists including librarians, political and social science researchers, and academics -- are working together to document federal agencies' presence on the World Wide Web during the transition of government. The panelists will also describe how this effort is enhancing the existing collections of the five partner institutions.

Session 12 (Providence III, Third Floor)

A) Digital Performance Institute Alyce Dissette and Liz Dreyer, both Digital Performance Institute The Digital Performance Institute (DPI) is developing a comprehensive multi-disciplinary performing arts production database (PADb) on the Internet. The goal is to establish a central and user friendly online "hub" as a source of information about specific productions including theater, dance, opera, interdisciplinary, experimental works and the artists involved in making them. The PADb site is targeted for general reference primarily for professionals, but will be useful to academics, and audiences.

B) Listen Up! Australian Oral History and Folklore Recordings Online Ingrid Finnane, National Library of Australia

The National Library of Australia has an ongoing digitization program for preservation of its sound recording collection. Currently 50% of the oral history and folklore collection is digitized and stored on our Digital Object Storage System. However, researchers who want access to the material have had to visit the Library in Canberra or request a copy sent on physical media.

This year we released the first stage of a delivery system that includes time-point links from summaries and full text transcripts to audio. Researchers now have immediate online access to selected content with open access conditions.

This presentation will describe how we used METS and the TEI text encoding standard to build the new delivery system, challenges encountered during the project, and the current workflow from digitization to online delivery.

DAY THREE: Friday, November 14

8:00 a.m. – 9:00 a.m.

Breakfast (Waterplace Ballroom, Second Floor)

9:00 a.m. – 10:30 p.m.

Session 13 (Providence III, Third Floor)

PANEL: Asserting Fair Uses Jonathan Band, policybandwidth; Georgia Harper [PRESENTATION], University of Texas at Austin; Molly Kleinman, University of Michigan; Virginia Rutledge, Creative Commons; Gretchen Wagner, ARTstor This panel will address the state of fair use. Panelists will discuss strategies for asserting fair use effectively, and will candidly expose some of their fair use fears and fantasies in the process. The panel will include perspectives from Jonathan Band who helps shape the laws governing intellectual property and the Internet through a combination of legislative and appellate advocacy; Georgia Harper, Scholarly Communications Adviser for the University of Texas at Austin; Molly Kleinman, Copyright Specialist and Special Projects Librarian at the University of Michigan Library; Virginia Rutledge, Special Counsel to Creative Commons; and Gretchen Wagner, General Counsel of ARTstor.

Session 14 (Providence II, Third Floor)

A) Preserving Brand-new Buildings: Digitally Archiving 3D CAD and Related Architectural Materials William Reilly [PRESENTATION], [PRESENTATION as WEB PAGE], MacKenzie Smith (Associate Director for Technology), Ann Whiteside (Head, Rotch Library of Art and Architecture), all Massachusetts Institute of Technology

Architectural collections within libraries, archives, and museums are increasingly faced with acquiring and preserving the artifacts of Computer-Aided Design (CAD) development, yet have few tools or approaches with which to address this today. This paper reviews the relevant issues faced to date with the FACADE ("Future-Proofing Architectural Computer-Aided Design") two-year IMLS funded research project, being conducted by the MIT Libraries, along with the School of Architecture and Planning. FACADE is charged with investigating how best to archive the highly proprietary, internally complex, and potentially short-lived digital artifacts of contemporary 3D CAD modelling tools, along with related files and materials as generated in real world datasets for building construction projects, as received from prominent architects. With an overall goal of developing production-quality open source software to permit collections to capture, describe, manage, preserve, and make available these digital CAD models, the project has engaged with a number of areas of interest. The ACE domain (architecture, construction, engineering) has its own set of data exchange formats in various stages of maturity and gestation (STEP, IFC, IGES, etc.); these form the basis for our explorations in data preservation formats for preservation strategy. Format registry entries for the native CAD tools into GDFR and/or PRONOM, and code to integrate DSpace with those registries, is another area of investigation. An ontology for the kinds of materials received has been designed, and some prototypical tools developed, to permit the creation of metadata (RDF) on a large set of materials almost always otherwise lacking in meta-information.

B) Watching Our Backs: Community Verification of Digital Preservation Systems John Mark Ockerbloom [PRESENTATION], University of Pennsylvania

Librarians and faculty agree that information preservation is one of the essential roles of libraries. Yet, as the information we manage increasingly becomes digital, we have to rely on new methods of preserving this information that have not been fully tested. While developing and auditing for best practices is important, we must also verify that preservation systems actually perform as we hope they will, preferably long before we have to fall back on them.

In this talk, I will show ways in which this verification can be done now, by the community, with reasonable cost and demonstrable efficacy. Specifically, I will describe Penn's failure recovery tests of LOCKSS, which uncovered issues with the system's performance and reliability, and helped lead to improvements addressing these issues. I will also discuss initiatives being organized through CRL to assess distributed auditing and community knowledge sharing to test and improve LOCKSS, Portico, and other shared preservation systems.

Session 15 (Providence I, Third Floor)

Optimizing Your Metadata and Digital Content for Sharing with MODS and Asset Actions Jon Dunn and Jenn Riley, both Indiana University

The DLF Aquifer Metadata Working Group has created guidelines for using the Metadata Object Description Schema (MODS) effectively when preparing metadata for use in aggregations. The Technology Working Group has been a primary force behind the development of Asset Actions, a standardized, easy-to-implement mechanism for exposing behaviors across a class of digital objects. Both developments enhance a digital library user's ability to find, identify, use, and re-use digital objects.

This 90 minute hands-on workshop will assist participants in preparing Asset Actions for items in their collections, help participants improve the shareability and conformance to the Aquifer MODS Guidelines of their MODS records, and/or effectively map metadata from other formats into MODS optimized for aggregation. Members of the DLF Aquifer working groups that developed the standards will demonstrate and be on hand to assist participants. Bring a laptop with an XML editor of your choice, the knowledge and capability to transform XML documents using XSLT, and records for items from your repository that you want to experiment with.

At the end of the session, participants should better understand what a best practice-based implementation of Asset Actions or MODS for their collections should look like, and be able to plan for implementing this understanding in their repository.

10:30 a.m. – 11:00 a.m.

Break (Pre-function Area, Third Floor)

11:00 a.m. – 12:30 p.m.

Session 16 (Providence I, Third Floor)

A) Massachusetts Institute of Technology (MIT) GeoWeb Expands Access to GIS data through Open Source Tools Lisa Sweeney [PRESENTATION], Massachusetts Institute of Technology

MIT GeoWeb, a new interface to the MIT Geodata Repository, enables MIT community users to access Geographic Information Systems (GIS) data through a standard web browser. The web interface allows users to search, view, and download GIS data and metadata from the MIT Geodata Repository, a collection of international GIS data maintained by MIT GIS Services. Users will find data in the MIT system not freely available on the web and can view or download the data, and manipulate and analyze it in whatever system they choose. GeoWeb combines a multiplicity of search types that is geographically and visually based, in combination with more traditional text based searching.

GeoWeb was built in Spring 2008, using GeoServer, ArcSDE, TileCache, and PHP on the server side, and with the OpenLayers and jQuery JavaScript libraries on the client side. Development tools included text editors and Firebug. We have experienced various challenges and learned a lot from merging proprietary and open source systems.

Enabling searching through a web browser eliminates the barrier of having specialized software to discover and access GIS data. It also creates the potential for endless possibilities for connecting different systems within the MIT libraries as well as with other institutions, for instance exploring MARC map records searchable through GeoWeb, with all digital GIS materials, as well as connecting with GIS systems beyond MIT.

B) OpenMIC Project and the MIC Registry Jane Otto [PRESENTATION], Library of Congress

OpenMIC is an open source, web-based cataloging tool that can be used as a standalone application or integrated with other repository architectures by a range of organizations. It provides a complete metadata creation system for analog and digital materials, with services to export these metadata in standard formats.

Full METS support
Low overhead and infrastructure requirements
Events-based data model for management and rights documentation
Customization capabilities
Mapping from both standard and local in-house schema
Import and export utilities
Unicode and CJK vernacular character support

The OpenMIC cataloging utility is one component of the MIC portal for moving images. MIC (Moving Image Collections) is a preservation, access, and education initiative co-sponsored by the Library of Congress and the Association of Moving Image Archivists (AMIA), in partnership with Rutgers University Libraries, the technology lead. MIC (pronounced 'Mike') integrates a union catalog, directories, and the cataloging utility with informational resources in a portal structure delivering customized information on moving images, their preservation, and the images themselves to archivists, researchers, educators, and the general public. MIC's mandate is both to document the national moving image collections environment and to serve as a coordinator and catalyst for the preservation of the nation's moving image heritage. MIC's mission is to immerse moving images into the education mainstream, recognizing that what society uses, it values, and what it values, it preserves.

Session 17 (Providence II, Third Floor)

Google Book Search: An Update Dan Clancy and Jon Orwant, both Google As part of its mission to organize the world's information, Google is scanning the world's books to make them discoverable, searchable, and -- where rights permit -- readable. In this talk, we'll describe the current state of the effort, focusing on the challenges involved in determining copyright status.

12:30 p.m.

Adjourn

POST-CONFERENCE 1:00 p.m. – 5:00 p.m.

Project Manager's Group Meeting (South County, Third Floor) [VINOPAL PRESENTATION]

[STEDFELD PRESENTATION]

return to top >>