DLF Fall Forum 2007 Program

DAY ONE: Monday, April 28

PRECONFERENCE 8:30 a.m. – 12:30 p.m.

Board of Trustees Meeting (Mirage Room, Second Floor)

Developers' Roundtable (Skyway Suite AB, Second Floor) John A. Kunze, chair (California Digital Library)

DLF/TEI-C Joint Meeting—for project participants (Minnehaha, Second Floor)

Aquifer Services Working Group Meeting—for project participants (Board Room, First Floor)

JHOVE Meeting—for project participants (Greenway I, Second Floor)

10:30 a.m. – 12:30 p.m.

Registration (Greenway Promenade, Second Floor)

11:30 a.m. – 12:00 p.m.

First-time Attendee Orientation (Greenway C – H, Second Floor) Barrie Howard, Digital Library Federation

Introduction 1:00 p.m. – 1:30 p.m.

Greetings and Welcome. (Greenway C – H, Second Floor) DLF President Carol A. Mandel, New York University; DLF Executive Director Peter Brantley

Keynote Address 1: Giving Maps a Second Life with Digital Technologies 1:30 p.m. – 2:30 p.m.

(Greenway C – H, Second Floor) David Rumsey [PRESENTATION],David Rumsey Historical Map Collection David Rumsey will show how his increasing use of digital technologies and the Internet over the past decade has transformed his work as a historical map scholar and collector. Using imaging software, GIS, and popular applications like Google Earth and Second Life, Rumsey has given new life to old maps, both in their dissemination and our ability to analyze and understand them, thereby unlocking the information held in maps for use in a wide range of disciplines. He will discuss and demonstrate how he offers these software tools and a growing number of digitized maps themselves on his free public online map library at www.davidrumsey.com

Keynote Address 2: The Invisible Computer Revolution: Information Opportunities in the Developing World 2:30 p.m. – 3:30 p.m.

(Greenway C – H, Second Floor) Joel Selanikio [PRESENTATION], DataDyne

Africa is leading the world in year-over-year growth in mobile phone penetration, and other parts of the developing world are close behind. Along with the internet, with which it is rapidly merging, this is the most astonishing technology story of our time, and one that has the power to revolutionize access to information across the developing world, because every single cell phone is a pocket computer more powerful than the original IBM PC or Apple Macintosh, with the added advantage of being wirelessly connected to the network.

In the very near future every schoolteacher and every health worker in every developing country will have such a pocket computer, capable of requesting and receiving information from the information repository of the internet. Every businessperson and every government official will be able to communicate more widely, and for less money, than ever before, using the computer in their pocket. And that computer will be the communications tool, and the schoolbook, and the vaccination record, and the family album, and more.

All this means that those in the information business need to stop waiting for aid agencies to provide laptops or desktops for schoolteachers (much less schoolchildren): the information revolution for developing countries is already underway. Instead we should ask ourselves "what mobile information software can we create -- can we IMAGINE -- that would really add value for a schoolteacher (or student, or health worker, or businessperson) and that could run on the computer they already have in their pocket?", and make it happen.

3:30 p.m. – 4:00 p.m.

Break (Greenway Promenade, Second Floor)

4:00 p.m. – 5:30 p.m.

Session 1 (Greenway C – E, Second Floor)

A) GIS in Service of Historical Bibliography. James Nye, University of Chicago This paper assesses the value of GIS for studies on the history of the book in colonial South Asia. It draws upon work completed under the South Asia Union Catalogue (SAUC) in tagging historical bibliographic data with geo-coordinates for the place of publication and future extensions of the SAUC data set which will support inquiry by scholars studying the history of publications from the subcontinent. The paper also evaluates our approach with the British Library in augmenting their bibliographic records from conversion of the British Museum General Catalogue of Printed Books through addition of fixed field data for country of publication using GIS data prepared under the Digital South Asia Library program .

B) Digital Content Management at Scale: A Case Study from Portico. Evan Owens [PRESENTATION], Portico In the management of any type of digital content, scale matters. The problems are different at different orders of magnitude. In 2007 Portico increased its content preservation processing capacity by a factor of 60 to a current capacity of around one million files per day. This presentation will review the technical and organizational challenges encountered and solved along the way. While the specifics are particular to the technologies and infrastructure used by Portico, the issues involved are typical of any large content management project: the optimization of infrastructure and applications as well as human processes to meet the demands of very high volumes of content.

Session 2 (Greenway F – H, Second Floor)

A) The Disappearing Data Problem: Preserving Today's Geospatial Data to Meet Tomorrow's Temporal Analysis Needs. Steve Morris [PRESENTATION], North Carolina State University Digital geospatial data typically consists of complex multi-file, multi-format objects which do not suffer well from neglect. Many of these data resources are subject to versioning over time and it is common for such data to have dependencies on other objects. The shift towards web services or API-based access to data and map resources raises new challenges, as there is no guarantee that an image or dataset viewed through a service today will be available tomorrow. The geospatial industry, which one might characterize as being temporally impaired, is just beginning to come to grips with the issue of sustaining temporal content. Business cases for data retention are provided by the increasing number of applications requiring the use of old data. The North Carolina Geospatial Data Archiving Project, one of the Library of Congress NDIIPP partnership projects, has focused on preservation of state and local agency geospatial data. A central theme of the project involves engaging existing spatial data infrastructure (SDI) in the data preservation effort. While technical issues do fall within the project scope, the real focus is on exploring the organizational and cultural issues associated with making data archiving a part of current practice within the geospatial data community. Project findings, surprises, and ongoing challenges will be presented.

B) A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery. Stacy Kowalczyk and James Halliday, both Indiana University IN Harmony: Sheet Music from Indiana is a multi-year IMSL funded project with the Indiana University Digital Library Program, Indiana University's Lilly Library, Indiana State Museum, Indiana Historical Society, and Indiana State Library. The project has two major deliverables - a tool for collecting metadata for sheet music and an online discovery and delivery system. With multiple partners and a variety of workflows, we needed to architect a flexible system for distributed digitizing, cataloging and authority control, with centralized quality control for digitizing and metadata as well as centralized data storage and delivery. Based on Fedora, this system has an innovative design both to manage the flow of data and to deliver data to researchers.

C) An OAI-ORE Aggregation for the National Virtual Observatory. David Reynolds [PRESENTATION], Tim DiLauro, and Sayeed Choudhury, all Johns Hopkins University Johns Hopkins University, the National Virtual Observatory (NVO), and its partners are developing a data curation prototype system that connects data deposition with the publishing process using a Fedora-based repository at its foundation. As part of this effort, the development team is evaluating the use of the Open Archives Initiative - Object Reuse and Exchange (ORE) specification to create an ORE Aggregation that models the relationships between the various resources related to a particular publication (e.g., an article, the data it describes, and references cited). As stated on the OAI-ORE web site (http://www.openarchives.org/ore/), OAI-ORE provides "specifications that allow distributed repositories to exchange information about their constituent digital objects. These specifications will include approaches for representing digital objects and repository services that facilitate access and ingest of these representations." This presentation will outline observations from the data modeling process, illustrate the mapping of the model to an ORE Aggregation (and its associated ORE Resource Map or ReM) for the NVO data curation system, and identify linkages between the ORE Aggregation and other conceptual models. Finally, the presentation will identify potential next steps to advance this work into other domains.

6:00 p.m. – 8:00 p.m.

Evening Reception (Nicollet D1/D2, First Floor)

DAY TWO: Tuesday, April 29

8:00 a.m. – 9:00 a.m.

Breakfast (Regency Room, Second Floor)

9:00 a.m. – 10:30 a.m.

Session 3 (Greenway C – E, Second Floor)

A) Web and New Media Strategy at the Smithsonian Institution. Michael Edson [PRESENTATION], Smithsonian Institute In 2007 the Smithsonian Institution began work on a first-ever comprehensive Web and New Media Strategy. This paper describes the strategy-planning process, current status and direction, and a vision for the digital future of the Institution's 28 museums and research centers. The paper will address the issues of brand confusion; external and institutional partnerships; transparency, governance and accountability; quality assurance; supporting technologies; the adoption of Long Tail and Web 2.0 idioms; e-commerce; research support; bridging internet/intranet/extranet boundaries; shared services; convergence; education/educator support; and data strategy.

B) Self-Archiving Legacy Toolkit. Will Snow [PRESENTATION] , Stanford University

This paper describes Stanford's work on the Self-Archiving Legacy Toolkit (SALT), which is exploring the capability of a human-computer system to transform unstructured and heterogeneous data into navigable collections of information presented in context. Stanford's University Archives are beginning to digitize and present the collected "papers" of luminary faculty. The full potential of this resource of scientific legacy is not realized by simple digitization. Through the application of semantic processing technologies (coupled with rich visualization tools), a luminary's lifetime collection of research, publications, correspondence and presentations can be accessed not only by keyword, but also by concept, collaborators, time, place, organization and even project. Our hypothesis is that these facets will transform the processing and delivery of personal archival collections, and expose the historical context and intellectual concepts threading through the careers of some of the greatest scientists and thinkers of the 20th century.

Central to the vision of SALT is the notion of self-description of a luminary's own corpus. In addition to providing oral history, video commentary and textual annotation to their collected works, the toolkit gives eminent researchers the tools to create, apply and edit their own taxonomies, ontologies and controlled vocabularies to their works. These knowledge editing tools will enable the archival subject to efficiently identify people, places, times, and concepts mentioned in their collected works and materials, and to draw relationships among these items to reveal paths of influence and the historic progression of ideas. In this way, they can interpret and extend their collection with their own personal viewpoint, creating a uniquely personal presentation of their own life story, which augments rather than replaces the traditional privileged view of archival provenance.

We are testing these tools with Edward Feigenbaum, a luminary AI scientist, whose collected papers have been partly digitized and analyzed. This paper will present findings, demonstrate the tool chain in its current prototype state, and outline areas for future effort.

Session 4 (Greenway F – H, Second Floor)

A) Library Integration with the Campus Enterprise and Beyond. Cody Hanson [PRESENTATION] and Shane Nackerud, both University of Minnesota Over 14,000 University of Minnesota students, faculty, and staff log in to the MyU portal every day. This Metadot-powered site delivers personalized information ranging from course syllabi to pay statements. In the past year, the University of Minnesota Libraries have begun delivering a wide variety of library tools and resources within the MyU Portal. Users can view their circulation information, bookmark their favorite indexes and e-journals, search the Libraries' resources, and view targeted lists of suggested information sources. These suggestions are made based on users' academic or administrative role within the University as represented by "affinity strings" generated from publicly available information managed in the University's PeopleSoft system. Subject librarians match these strings to library resources to provide users with a personalized set of tools relevant to the user's discipline. Through a similar process, undergraduates are presented with suggested resources based on the courses they are taking. By tracking usage according to affinity strings, the Libraries are able to aggregate data while maintaing user privacy. It is our hope that this data will eventually allow us to make resource recommendations to users based on the behavior of their peers. Building on the work we've done to display this information in the MyU Portal, the University Libraries have created mechanisms to move this data into other systems, such as iGoogle. In this session we will describe our efforts to deliver library resources through the MyU system, including the campus partnerships necessary to make this possible. The session will also feature a demonstration of the system, and a discussion of future plans.

B) Transforming the Student's Experience as Scholar. Harriette Hemmasi [PRESENTATION], Brown University Even in Brown University's highly motivated, democratic, and active learning environment where questioning academic authority occurs regularly and interdisciplinarity abounds, we cannot ignore the fundamental questions plaguing today's academy: How has technology affected teaching and learning? At the heart of this dilemma is the definition of literacy. Literacy in the 21st century has become multi-modal, reflective of the complex fluidity between analog and digital knowledge resources, traditional and non-traditional academic discourse, linear learning and interactive media, as well as the human mind and our limitless imaginations. Embracing this enlarged space gives us the ability to produce and reproduce knowledge in new and deeper ways and to transform the student's experience as scholar and life-longer learner. Hemmasi reports on results of a CLIR Symposium on Scholarly Methods in the Humanities to be held at Brown in April 2008.

10:30 a.m. – 11:00 a.m.

Break (Greenway Promenade, Second Floor)

11:00 a.m. – 12:30 p.m.

Session 5 (Greenway C – E, Second Floor)

PANEL: Implementing Preservation Metadata in Digital Library Applications: Using PREMIS in METS Rebecca Guenther [PRESENTATION], Library of Congress; Tom Habing [PRESENTATION], University of Illinois at Urbana-Champaign; Nancy Hoebelheinrich [PRESENTATION], Stanford University; Ardys Kozbial [PRESENTATION], University of California, San Diego; Rob Wolfe, Massachusetts Institute of Technology This session will review changes in the next version of PREMIS based on existing implementations of preservation metadata and discuss use of PREMIS within a digital library environment. The panel will include and introduction to PREMIS and changes in PREMIS 2.0 (Rebecca Guenther, LC) and the development of guidelines for using PREMIS with METS (Rob Wolfe, MIT). A panel of implementers will discuss implementation experience with PREMIS and METS, including: Using PREMIS with geospatial data objects (Nancy Hoebelheinrich, Stanford); Transferring digital objects between repositories with preservation metadata (Ardys Kozbial, UCSD); and The ECHO Dep Generic METS Profile for Preservation and Digital Repository Interoperability (Tom Habing, UIUC).

Session 6: (Greenway F – H, Second Floor)

A) The Challenges of Applying Traditional Cataloging Standards to Non-Traditional, User-Created Descriptive Metadata in Digital Library Projects. Heidi Frank and Jennifer Vinopal [PRESENTATION], both New York University This paper will explore the challenges encountered when applying traditional cataloging standards to non-standardized, user-created descriptive metadata in NYU's digital library production workflow. We will use as a case study one of NYU's digital library projects: the Hemispheric Institute Digital Video Library (HIDVL), undertaken in conjunction with NYU's Hemispheric Institute of Performance and Politics. In planning NYU's first significant, grant-funded digital library projects several years ago, we naturally sought to capitalize on our existing strengths to describe the scholarly content in our digital collections. Thus, cataloging has, to date, played a very important role in our digital library production by providing a standardized process for creating or cleaning up descriptive metadata before DL objects are ingested into our preservation repository and subsequently published on our website. However, in practice, we find drawbacks to this approach. For the HIDVL, in which our project partners create the initial descriptive metadata for the videos, it is, ironically, the very process of standardization through cataloging that has challenged our ability to adequately represent the objects in the collection. Through specific examples, we will discuss the limits of MARC/AACR2 to model how our project partners and users think about scholarly content, and we will suggest ways to mitigate this problem through modified workflows and alternate processes for collecting and storing metadata.

B) The Good, the Bad and the Ugly: Corralling an Okay Text Corpus from a Whole Heap o' Sources. Glen Worthey, Stanford University An increasingly important type of digital humanities research uses statistical evidence gleaned from "comprehensive" text corpora. This paper discusses the roles and challenges of the digital librarian in supporting this type of research, with specific emphasis on the curation of a statistically significant corpus. Both the size of a text corpus, and any inherent bias in the corpus composition, are potentially significant factors in such research. Although the number of available electronic texts from which to draw a research corpus is increasing rapidly, the variability of these texts is likewise increasing: no longer can we count on working only with well-behaved (or even well-formed), re-keyed, marked-up texts of canonical literary works. Mass digitization (often with uncorrected OCR) and a wider variety of commercial sources, are both blessing and curse: we are forced to admit that most of these less-than-perfect, less than homogenous, full-text collections might offer significant advantages of scale and scope. This paper examines, in a few case studies, the gathering and manipulation of existing digital library sources, and the creation of new digital texts to fill in gaps. I will discuss the assessment of project requirements; the identification of potential sources; licensing issues; sharing of resources with other institutions; and the more technical issues around determining and obtaining "good-enough" text accuracy; "rich-enough" markup creation, transformation and normalization; and customized access to the corpora in ways that don't threaten more everyday access to our digital resources. Finally, I'll discuss some of the issues involved in presenting, archiving, and preserving the resources we've created for "special" projects so that they both fit appropriately into our "general" digital library, and remain useful for future research.

12:30 p.m. – 2:30 p.m.

Break for Lunch and Networking [Individual choice]

12:30 p.m. – 2:30 p.m.

Aquifer Technical Advisors Meeting—for participants only (Greenway I, Second Floor)

Bird-of-a-Feather Sessions 2:30 p.m. – 3:30 p.m.

1) Considering the DLF Web Site. (Greenway C – E, Second Floor) Eric Celeste [PRESENTATION], Consultant; Barrie Howard, DLF The Digital Library Federation (DLF) relies on its Web site to communicate with everyone from staff of member libraries to the public at large. The Web site mediates the first contact much of the world has with DLF and its mission. It serves as a communication mechanism for DLF members and working group participants. It becomes the archive and record of DLF accomplishments. Take a look at the functional requirements for the next iteration of the DLF Web site and let us know what we are missing and what platforms you think we should consider for building it.

2) Asset Actions Next Steps: Atom/OAI-ORE and Zotero. (Greenway F – H, Second Floor) Todd Grappone, University of Southern California; Esha Datta, New York University; Jody Deridder, University of Tennessee; Tom Habing, University of Illinois at Urbana-Champaign The DLF Aquifer project is extending its asset action framework to work with Atom/OAI-ORE syndication. As libraries move more and more to externally hosted content like American Social History Online we are seeing the emergence of the Digital Library as a Service (DLaaS): a service where content resides outside of its native library system to achieve greater usability and visibility. In order for libraries and researchers to take advantage of this type of content we need to develop web agents to create greater usability. To this end the Aquifer project has offered Asset Actions as a model for this type of usable content. An asset action package is an XML-defined set of actionable URIs for a digital resource that delivers named, typed actions for that resource. Packages are made up of action groups, which are sets of actions, with an optional set of parameters that are specific to the group. Every asset action package contains at least a "default" action group that provides a basic high-level set of actions. Recent work in implementation of the framework has focused on Zotero integration to take advantage of Asset Actions via Atom/OAI-ORE. This discussion session will focus on the recent work in this area from the Aquifer Technology and Architecture Working Group.

3) Open Access Mandates: Opportunities for DLF Institutions. (Greenway A, Second Floor) Geneva Henry, Rice University Recent actions by the NIH and Harvard mandating open access of faculty publications have created a window of opportunity for aggressively promoting the institutional repositories on our campuses. As we work with faculty to educate them on open access and about our repositories, libraries are developing or publicizing guidelines and services to facilitate self-archiving and ensure that faculty meet emerging mandates. Discussion on listservs around these issues has been lively over the past couple of months and faculty are now very interested in the topic. Given the opportune timing, are we leveraging it to the best of our abilities? Success in this area will be realized with as many research institutions as possible adopting open access mandates for faculty publications. DLF institutions are among the top research universities that can lead this movement and bring others along with them to adopt open access self-archiving of research publications. This discussion session is intended to explore ways in which resources and activities can be shared and coordinated among DLF libraries to create favorable environments to promote self-archiving. What information can we give our faculty and administration to help them advocate an open access mandate through their faculty governance bodies? What are we learning from them as we move forward to help meet existing mandates? How do we leverage the NIH mandate to a campus-wide mandate? What has been working and what hasn't? By sharing information and ideas, DLF institutions can make a positive and powerful impact. Bring your thoughts for discussion.

4) Implementing the ILS-Discovery Interface Recommendations. (Greenway B, Second Floor) John Mark Ockerbloom, University of Pennsylvania; DLF ILS Discovery Interface Task Force The DLF's ILS-DI group has been refining its recommendations for interfaces between the ILS and discovery applications (an early draft of which was discussed at the last DLF Forum), and discussing them with vendors and developers. This BOF session will discuss the further development of the recommendation, response to it from vendors and developers, and reference implementations and client applications based on the recommendation. We can also discuss ways to encourage adoption of the recommendation, and support for the associated standards, by ILS and application developers and vendors.

5) ARTstor — A Platform for Broad-scale Sharing of Digital Images (Greenway J, Second Floor) Carole Ann Fabian and James Shulman, both ARTstor

ARTstor started with the goal of creating and sharing quality collections of digital images that respond to educational and scholarly community needs. With approximately 750,000 images currently available and thousands more to be released over the next few years, we now have a significant - although by no means complete - number of core collections in the library. In addition, we currently offer features and services within the ARTstor environment that support the ability for individual and institutional users to view their own hundreds of thousands of personal images and over one million hosted images alongside the ARTstor collections.

In working with ARTstor collection contributors and hosting participants, we are increasingly aware of growing community interest in sharing digital images across institutional boundaries. We are frequently engaged in discussion and participate in projects with various groups across our community of contributors and users interested in exploring the potential opportunities and constraints (legal, technical and practical) that exist for broad-scale sharing of scholarly digital image content across varied institutions and organizations.

This session will report on ARTstor's institutional collections pilot program (including results of our recent survey of current practices for institutional digital image collection development) as well as our plans for Hosting Phase Two. We will also share our ideas about leveraging the ARTstor technology platform to provide a more open environment for sharing digital image collections and facilitate discussion on what role ARTstor might play in helping to build out an open, trusted and persistent network for sharing image content of interest to educational and research users.

3:30 p.m. – 4:00 p.m.

Break (Greenway Promenade, Second Floor)

4:00 p.m. – 5:30 p.m.

Session 7 (Greenway C – E, Second Floor)

A) Mitigating Preservation Threats: Standards and Practices in the National Digital Newspaper Program. David Brunton and Deborah Thomas [PRESENTATION], both Library of Congress The National Digital Newspaper Program (NDNP), a partnership between the National Endowment for the Humanities (NEH) and the Library of Congress (LC), is a long-term effort to develop an Internet-based, searchable database of all U.S. newspapers with descriptive information and select digitization of historic pages contributed by, eventually, all U.S. states and territories. Aggregating digital content produced by many institutions, the program's use of digitization standards and validation in the production and management of this data are critical to mitigating preservation threats and risks to the digital collections over time. This presentation will discuss the strategic development and implementation of these standards and operations and how they enhance our efficiency and sustainability, both during the life of the program and beyond.

B) DLF Environmental Scan: Issues and Solutions for Archives in a Digital World Jen Mohan [PRESENTATION], Intelligent Television This environmental scan, sponsored by the DLF and funded by the Mellon Foundation, aims to gather substantial information on moving image archives across the United States and assess their readiness to be digitized. Tens of thousands of moving image materials are locked in analog formats, and it was the goal of the project to delve into what issues and obstacles archives and other media repositories face in terms of digitizing their holdings. Focusing on museums, state archives, colleges and universities, public television stations, moving image archives and public libraries, the environmental scan sought to obtain information on the material condition, metadata and cataloguing records, storage conditions, and other technical and organizational impediments these institutions face in terms of launching digital projects. Another product of the environmental scan was its development of a Bill of Rights that would help guide archives and repositories that have or are considering entering into agreements with private companies regarding digitizing their holdings. Crafted by opinions and issues raised by the participants, it is a launching point for institutions that are considering entering into such agreements.

Session 8 (Greenway F – H, Second Floor)

A) Towards a Social Science Data Network. Micah Altman [PRESENTATION], Harvard University Cyberinfrastructure has the potential to catalyze the application of sophisticated analytic methods to a new evidence base, revolutionizing the social sciences. There are however, many gaps in the current infrastructure supporting scholarly analysis: gaps between theory and data collection, between collection and analysis, and among analysis, publication and reuse. We describe areas of research and development aimed to close these gaps. Together this set of research and development projects brings the social sciences closer to an integrated Data Network for research. In the future, such Data Networks will enhance the replicability of research, enable new forms and scales of analysis, and ultimately strengthen the connections between scientific evidence, and the publications and policies that rely on it.

B) MobiLIB: A Mobile Library Service at North Carolina State. Markus Wust [PRESENTATION], North Carolina State University Despite the emergence of devices such as Apple's iPhone that advertise their ability to display regular-sized websites, the small displays on most phones still require mobile-specific pages, such as the ones that are already offered by large mobile service providers (e.g., Google, Yahoo, MSN, Facebook, �). Recognizing the growing importance of such devices in our students' lives, NCSU Libraries has developed MobiLIB, a set of primarily library-related services that have been designed for improved accessibility through mobile phones or PDAs and that have been selected based on their potential usefulness within a mobile usage context. Currently, MobiLIB consists of seven services: - A mobile catalog based on the NCSU Libraries' "CatalogWS" catalog API - A page listing the number of currently available computers within the main library building - A listing of today's and tomorrow's opening hours - A campus directory - Contact information for key library departments and services - A directory of links to external service providers - Current status information for all university buses After a brief overview of the different types of mobile services currently offered by other libraries, this presentation will describe the library's motivation for creating MobiLIB as well as the individual services and some of the mobile-specific design decisions. The presentation will conclude with a discussion of some preliminary usage statistics.

C) XTF 2.1: Powerful Search and Display without the Headaches. Martin Haye [PRESENTATION], California Digital Library The eXtensible Text Framework (XTF) has advanced rapidly since its inception in 2001, with each release adding significant enhancements and each year bringing adoption by more online publishing projects, both at the California Digital Library (CDL) and institutions worldwide. The new XTF 2.1 release lowers development barriers to utilizing its powerful search and display technologies. The XTF user community identified a set of advanced features that have been challenging for them to implement, and with this guidance we refactored and expanded XTF's out-of-the-box support. Features that are now easier to customize and deploy include faceted browsing, hierarchical facets, multi-field keyword searching, OAI/PMH, user bookbags, spelling correction, and similar item suggestions. While the mantra of XTF has been and remains "adaptation through programming" in XSLT, deployment is significantly eased with the inclusion of boilerplate (and often non-trivial) code in the default stylesheets to deliver a full, rich user experience. This paper will show how these features can be delivered using XTF, drawing examples from the new default interface as well as production XTF implementations such as the Mark Twain Project Online and eScholarship Editions. We'll conclude with a brief look at planned improvements for future releases.

DAY THREE: Wednesday, April 30

8:00 a.m. – 9:00 a.m.

Breakfast (Regency Room, Second Floor)

9:00 a.m. – 10:30 p.m.

Session 9 (Greenway C – E, Second Floor)

PANEL: Implementing User-centered Design with Agile Software Development. Martin Halbert, Katherine Skinner [PRESENTATION], and Kyle Fenton, all Emory University; Tom Habing and Susan Harum, both University of Illinois at Urbana-Champaign; Chick Markley, DLF Aquifer Both Emory University's SouthComb Cyberinfrastructure for Scholars project and DLF Aquifer's American Social History Online have used similar approaches to insure that user experience drives development. This panel will review methods we effectively used and demonstrate the results achieved. A. The Emory Cyberinfrastructure for Scholars project works directly with scholars at multiple universities on issues of user navigation, portal interaction, and application functionality. This presentation on the resulting SouthComb service will: 1. Describe the process of engaging both users and mid-level service managers in designing the user experience for the SouthComb interdisciplinary metasearch and social information service 2. provide findings about integrating the agile development process into a user-centered design loop B. The DLF Aquifer initiative goal to make finding and using digital content easier for scholars is being realized through the American Social History Online website and associated services. This presentation will: 1. Highlight recent enhancements to the website, based on user feedback during the agile development process, that support discovering previously unknown material 2. Describe what early assessment efforts using Google analytics reveal about use of the website 3. Explain current work to incorporate the Aquifer asset action experiment that was described at the DLF Spring Forum in 2006 into a production level set of services that can be used by tools like Zotero, in American Social History Online.

Session 10 (Greenway F – H, Second Floor)

A) DSpace Foundation: Charting a course to support the Dspace community and advance the DSpace open source software platform. Michele Kimpton [PRESENTATION], DSpace Foundation

The DSpace Foundation recently completed a survey of users of the DSpace Platform. The purpose of the survey was to find out how the Foundation could best serve the community in their use of DSpace software, and to identify the biggest challenges faced creating a successful digital repository. Over 350 users worldwide responded to the survey. The presentation will discuss what the current frustrations of running a digital repository are, roadblocks to creating a successful repository, and how to improve upon community participation and engagement within DSpace. The foundation has outlined a plan of action to address the issues and looks to share its results with Digital Librarians more broadly who identify with similar issues regardless of the software platform being used.

The plan of action will include the re-architecture of the DSpace platform, known as DSpace 2.0. The recommendations for the re-architecture of DSpace came out in late 2006 by the technical advisory board. The community of developers has been unable to orchestrate the entire development for such a large undertaking. The Foundation has recently received funding from JISC, MIT and HP to lead the re-architecture of the platform to enable a more modular, flexible, service oriented approach. The presentation will highlight some of the work to be done, and in particular will discuss the benefits and challenges of re-architecture for new and existing users.

B) Building a Large-Scale Preservation Repository Based on aDORe. William Kehoe [PRESENTATION], Cornell University When Cornell University Library decided to preserve many terabytes of digitized books, we didn't have a ready-made infrastructure in place. Like every other institution which builds a large-scale repository on the Open Archival Information System model, we faced an unfamiliar set of problems brought on by the scale of the undertaking. We knew we were risking the loss of a very large portion of our digital assets, as well as the waste of much labor, if we failed to manage successfully a collection of many millions of files. We needed an easily scalable system we could administer with little labor. We decided that a standards-based system backed by the archival storage of an aDORe repository would do the job. This presentation will consider some of the challenges and solutions of the infrastructure we built.

C) Experiment to Investigate the Scalability of a DSpace-based Archive. Dharitri Misra, U.S. National Library of Medicine An important but often overlooked factor in building a large-scale archive is determining its scalability, that is: whether the archive would accommodate large numbers of items, in terms of ingesting, indexing or providing access, without compromising performance. Challenges in benchmarking system performance include difficulties in assembling large amounts of test data to conduct realistic tests, and developing tools to record and interpret the results. Here we describe an experiment to test the scalability of DSpace by building a million-item archive and measuring ingest time as a function of archive size. It was conducted using a DSpace-based system called SPER (System for the Preservation of Electronic Resources), developed at an R&D division of the U.S. National Library of Medicine to investigate important aspects of digital preservation, including automated metadata extraction. In our experiment, more than a million submission items were generated by replicating actual documents from an early 20th century medico-legal collection of the Food and Drug Administration -- Notices of Judgment (NJ) of court cases against manufacturers of misbranded or adulterated foods and drugs. SPER is used both to extract metadata for each NJ from the scanned TIFF images using machine learning tools, and then to archive the NJs with their metadata in its DSpace-based archive. Our presentation describes the design and implementation of tools for generating the input data and conducting the scalability experiment. Our results illustrate the variation in ingest time with archive size, and show that DSpace can support ingest of million-plus items with acceptable performance.

10:30 a.m. – 11:00 a.m.

Break (Greenway Promenade, Second Floor)

11:00 a.m. – 12:30 p.m.

Session 11 (Greenway C – E, Second Floor)

A) Introducing BibApp 1.0. Eric Larson [PRESENTATION], University of Wisconsin-Madison; Sarah L. Shreeves, University of Illinois at Urbana-Champaign In 2007 the University of Wisconsin-Madison Library began to collaborate with the University of Illinois at Urbana-Champaign on the development of a 1.0 version of a tool called BibApp. Originally developed by staff at the Wendt Library at UW-Madison, the BibApp is an 'institutional bibliography' or 'citation repository' system that allows easy ingest and export of faculty publication histories at individual and departmental levels. BibApp facilitates mining the publication history of an institution's faculty to allow a data-driven view of what and where faculty are publishing. This data can be used to better understand current and long term trends, as well as to identify which material can be made available through an open access repository. This paper will describe the BibApp 1.0 which leverages the latest web application development technologies to ensure simplicity of design and encourage collaborative development. The BibApp technology stack includes Ruby on Rails 2.0, Solr / Lucene, RESTful architecture, and utilizes some of the tools developed by the Zotero project and others. We will present findings from focus groups with faculty and, based on these findings, how we developed BibApp to meet the needs identified through these and other work, and will discuss future plans for the BibApp (including understanding ties between faculty members across campus as well as strength of ties between groups on campus). A prototype implementation of the original version (0.4) of the BibApp can be seen at http://www.library.uiuc.edu/bibapp/ (note that this is a prototype and may not be reliably available).

B) The NEW IUScholarWorks at Indiana University: Repositories, Journals, and Scholarly Publishing. Randall Floyd [PRESENTATION], Indiana University

Indiana University recently announced the publication of Museum Anthropology Review, the first faculty-generated electronic journal supported by the IU Bloomington Libraries and the IU Digital Library Program. Museum Anthropology Review is a pilot test of IUScholarWorks Journals, a new service that runs on Open Journal Systems software. It is a companion to Indiana University's institutional repository, launched in early 2006 and initially named IUScholarWorks, which runs on DSpace repository software.

In conjunction with the launch of the journal service, the IUScholarWorks name has been expanded in scope to be a set of services for open-access scholarly communication that includes an institutional repository, support for journal publishing, utility services for interoperability, and is open for the future addition of related services. As part of this restructuring, the institutional repository was renamed as IUScholarWorks Repository, the journal service was named IUScholarWorks Journals, and a new website was created for IUScholarWorks to serve as a portal to services that support open-access scholarly communication. A consistent visual identity was applied across all services to create a seamless user experience.

We propose to describe and discuss the evolution of IUScholarWorks and provide an overview of the technologies used to support its services. We will also explore our experiences in our first and very successful faculty partnership to support the electronic publication of a journal using Open Journal Systems software. Finally, we will provide insight on the future of IUScholarWorks services and what technical work will be required to meet new and existing challenges.

Session 12 (Greenway F – H, Second Floor)

A) Turning the Pages 2.0—One Year On. Michael Stocking [PRESENTATION], Armadillo Systems In February 2007 the British Library launched Turning the Pages 2.0 with a little help from Bill Gates. The idea was to provide a truly "next-generation" user experience for library visitors (onsite and offsite). One year on, what have we learnt about the value of 3D, the use of collaborative working and the initial toolset? Has the platform choice (Windows Presentation Foundation and Silverlight) and the file format (JPEG XR) helped or hindered our progress? We'll also address the issue of scale - what's the point of an application if it can't cope with 1000 books? Moving forward we explain plans to integrate Seadragon-like facilities into the application as well as use the power of XAML to depict 3D artefacts.

B) HarvestChoice: Developing Biblio-spatial Integrations for Search. John Butler, Chad Fennell, and Philip Pardey, all University of Minnesota This presentation provides a programmatic and technical overview of the HarvestChoice initiative (http://harvestchoice.org/), which seeks to accelerate and enhance the performance of agricultural systems most likely to bring benefit to the world's poor and undernourished. With a multi-year grant by the Gates Foundation's Global Development Program, HarvestChoice and its growing number of collaborators are developing a web portal that ties together databases, tools, analyses, and syntheses intended to improve investment and policy decisions. The initiative makes extensive use of literature sources and reviews, household surveys, GIS-based data sets and analytical tools, crop growth simulation methods, and a suite of spatially disaggregated multi-market and economy-wide models. A variety of applications support the HarvestChoice portal in the delivery of geospatial and bibliographic data to end users. These include the Drupal content management system for the delivery of site content, as well as the storage of bibliographic data and related metadata. Georeferenced data is housed in a PostGIS data store with a GeoServer interface for the dynamic generation of map data (Web Map Service) as well as complex query construction for raw geographic data (Web Feature Service). The GeoNetwork Information Management System provides support for the storage of standard geographic information metadata (ISO 19115) and acts as a front-end client to the GeoServer. Tying together geospatial and bibliographic data, the Solr enterprise search server works in conjunction with Drupal to provide end-users with an integrated search experience of all data across the project.

12:30 p.m.

Adjourn

POST-CONFERENCE 1:00 p.m. – 5:00 p.m.

METS Open Meeting—open to all (Greenway A, Second Floor)

SouthComb Meeting —for project participants (Greenway B, Second Floor)

Thursday, May 1

POST-CONFERENCE 8:30 a.m. – 12:30 p.m.

METS Editorial Board Meeting—for project participants (Lake Nokomis, Fifth Floor)

DLF ILS Discovery Interface Task Force—for project participants (Lake Minnetonka, Fifth Floor)

return to top >>