The UK has a tradition of central funded activity in the provision of network and information services. The utility of this central activity is recognised, and is now being built on to create a Distributed National Electronic Resource. This will further develop a rich information and learning resource, woven into the fabric of people's working and learning lives.
This presentation will describe the current provision of national services, and the steps being taken to move these to a new level. It will discuss a coherent national programme which embraces:
It will discuss issues posed by this ambitious programme, and describe current achievements.
In its charter, the DLF states its aim as bringing "together -- from across the nation and beyond -- digitized materials that will be made accessible to students, scholars, and citizens everywhere, and that document the building and dynamics of America's heritage and cultures". Its progress providing the level of interoperability envisaged in this mission statement has, like that of the library community more generally, been slowed by the limitations inherent in existing solutions. The Open Archives initiative promises significantly to enhance activity in this area.
The OAi supplies a new technical framework for accessing networked information resources, one that adapts a harvesting technique used by the Internet search engines. The framework envisages data providers and service providers. A data provider agrees to support a simple harvesting protocol and to provide extracts of its metadata in a common, minimal-level format in response to harvest requests. It then records information about its collection in a shared registry. A service provider uses this registry to locate participating data providers, and uses the harvest protocol to collect metadata from them. The service provider is then able to build intellectually useful services, such as catalogs and portals to materials distributed across multiple sites. The framework has many virtues.
In this session, speakers will outline recent developments within the OAi (now supported jointly by the DLF and the Coalition for Networked Information), introduce the technical framework, and demonstrate its potential for digital library and other communities that are interested in the open interchange of scholarly information.
Indiana University is launching a four-year initiative to establish a Digital Music Library (DML) testbed, to develop applications for education and research in the field of music, and to conduct digital library research in the areas of instruction, usability, and intellectual property rights. Through this collaborative project, involving information technology specialists, researchers, librarians, and music experts, IU will develop software tools and applications to support music teaching, learning, and research.
The DML will provide access to a collection of music in several formats from a range of musical styles. The collection will include sound recordings of musical performances; images of published scores; encoded score notation files; MIDI format files for audio playback; and active links that connect a musical work in one format to a representation in a different format.
The DML will provide integrated multimedia library services including search, retrieval and synchronized playback of recorded music, MIDI files and encoded music notation files; navigation within individual recordings or other music representations; access control, authentication, and metadata for rights management.
The DML will also provide a framework for developing software that integrates the collections of the DML into applications for teaching and research in the field of music.
Digital library research will be coordinated so that research outcomes will be incorporated in testbed and application development activities. Usability research will be integral to creation of the DML and will be incorporated in the design process. Copyright research will help identify content for inclusion in the DML by defining categories of works that are in the public domain, identifying alternatives for providing access to protected works, and helping specify system design features that may satisfy access restriction requirements.
Participating with Indiana University in this project are a number of "satellite sites" at remote locations in the U.S. and overseas, including: University of Illinois at Urbana-Champaign, University of Massachusetts-Amherst, Northwestern University in the U.S.; King's College, Loughborough University, Oxford University in the U.K.; and Waseda University in Japan.
The Library of Congress is planning a new National Audio-Visual Conservation Center in Culpeper, Virginia, scheduled to open in 2003. The Center will feature improved storage for the Library's recorded sound and moving image collections, a new nitrate film laboratory, a collections processing and cataloging activity, and a multipurpose digital facility. The digital facility will support the preservation of sound and video recordings, conduct research to improve digital preservation, and provide remote access to audio-visual collections for researchers in the Library's Capitol Hill reading rooms.
Prototyping and design for the digital facility is taking place in 2000-2002, with implementation to continue as the Center opens and begins operation. There will be two key elements: digital production and a repository. The production facility at the Center will reformat existing collections and process newly acquired audio-visual materials in digital form. In planning for the repository, the Reference Model for Open Archival Information Systems (OAIS) has proved helpful. The audio-visual group will focus on the specialized functional elements of the model called ingestion and access. Meanwhile, the project will participate in Library-wide development of an enterprise-service repository that will provide the archival storage, administration, and data management functions for all forms of digital content.
The Audio-Visual Prototyping Project is currently undertaking a number feasibility tests and studies that underpin the broader planning effort: (1) identifying computer-file formats suitable for the preservation reformatting of recorded sound collections, including those with visual and textual elements, (2) experimental capture of curator-selected Web sites deemed suitable for addition to the Library's audio-visual collections, (3) the definition of descriptive, structural, and administrative metadata to be captured in association with the production process, (4) development of a preliminary methodology for the capture of this metadata, and (5) applying an XML-based encoding scheme to audio-visual digital archival objects. The XML scheme being tested is the one developed for the Making of America 2 project by the University of California at Berkeley.
The Audio-Visual Prototyping Project is being carried out by the Motion Picture, Broadcasting, and Recorded Sound Division, supported by the National Digital Library Program and the Preservation Directorate.
The Survivors of the Shoah Visual History Foundation (VHF) is a non-profit institution founded by Steven Spielberg in 1994. The goal of the VHF is to catalogue and archive video and other data relating to historical events and to make these materials available to museums, educational institutions and non-profit organizations throughout the world via mass media (books, films, CD-ROMs, etc.) and on-line, interactive networks. VHF's current focus is to document the Holocaust, or Shoah, through first hand testimonies of Holocaust survivors and witnesses in such a manner that it may serve as a tool to teach the public about racial, ethnic, and cultural tolerance.
As of October 2000, VHF has conducted over 51,000 interviews in 32 languages at a cost of approximately $2,000 per interview, including indexing and cataloging. The VHF has developed a sophisticated system using the EMC Media Server, ADIC AML mass storage tape library and SGI clients for storage, management, and display of the over 180 tera-bytes of multimedia data. The VHF is actively digitizing and indexing interviews and other materials which are then stored and archived, all using a Digital Library Software Architecture developed by the Foundation. The VHF-developed database, using Sybase, permits querying by keywords or people to retrieve and display digital segments of video or still imagery collected from the interviews.
Dissemination of the archive has recently become the primary focus of the Foundation. The Foundation has been creating research and commercial partnerships in order to make the archive available for use in schools, universities, libraries and other public institutions around the world. The first research relationship, which began in 1997, was with the Department of Defense (DoD) through a high speed network test bed called ATDNet. The DoD, in tandem with QWEST communications, supplied the Foundation with OC3 ATM connections between museums in Los Angeles and Washington D.C.. Using this test bed, the Foundation has built systems for disseminating the archive over high speed networks. A similar relationship is being created with Internet-2 which will enable access to the Foundation's archive at 170+ universities in the United States.
The creation and acquisition of large digital holdings as part of our library collections has led to a need for review and sometimes recreation of existing library policies in order to address issues and concerns around digital materials. This talk will look at ways in which established library principles and policies can be extended to digital materials, at the need for new statements of practice, and at the importance of involving stakeholders from throughout the library community. It will focus these issues with two example cases: the development of policy and guidelines around digital reformatting of embrittled originals and the development of preservation principles for digital materials.
Recent reports suggest that humanities scholarship is not yet well supported by digital libraries and that future system design should be informed by a more thorough understanding of the practices and needs of working scholars. This paper presents highlights from a research project that specifically addressed this problem through a qualitative study of work processes and technology use in the humanities. Through interviews and case studies, we examined the information work involved in the current research projects of a group of active humanities researchers. The results update our understanding of the relationship between the activities, resources, and technologies involved in scholarly work and assess the effect that new technologies are having on the scholarly process. While scholars are interacting with a larger universe of materials and using some digital resources in new and creative ways, primary source material and bibliographies remain vital. Based on our analysis of the way scholars work and the things that they value, we propose a series of principles and priorities for the development of user-centered digital collections and services.
This paper will be a report on the DLF initiative, "Strategies for developing sustainable and scaleable digital library collections," currently underway and due to be completed in the spring of 2001. Three authors have begun work on papers that will present an overview of DLF members police and practices in digital collection development and identify best practices in three areas:
Smith will discuss the approaches that the authors are taking, present preliminary findings and their implications for libraries, and outline the process of review, refinement, and endorsement scheduled by DLF.
At the Library of Congress, the Web Preservation Project is an experimental project to identify, select, collect, preserve and offer access to open-access materials from the World Wide Web. A cross-disciplinary team has worked to: build a consensus within the Library, develop partnerships with external bodies, study selection criteria, technical and policy issues, and establish a prototype system for future planning and growth. The presentation will provide an overview of this project and describe in more detail the Library's approach to the creation of collections policies and practices in the selection and acquisition of born digital materials.
Recently the DLF has been working to reduce one of the most substantial impediments to the development of digital archival repositories for electronic scholarly journals -agreement about the minimum requirements that all stakeholders (including publishers as journal producers, libraries as journal consumers, and scholars as journal authors and readers) will have of the digital archival repository. Working together with CLIR and with CNI, the DLF has developed a statement of requirement and reached consensus about it with groups of libraries and publishers. It is now extending its work by documenting levels of service that the archival repository may supply. Once these service levels are documented, they will be discussed at stakeholder workshops convened to identify the minimum service levels that are acceptable to all parties. With these benchmarks to hand, libraries, publishers, and scholars can begin to have more meaningful discussion (even negotiation) about digital preservation. In addition, would-be digital archival repositories will have a route map for their development and an environment of trust that may prove essential to their long-term viability. The session will supply an overview of the DLF's work in this area and an early opportunity to present draft archival service levels for public review and comment.
Since 1998, the National Archives and Records Administration launched a number of research and development projects addressing the challenges posed by the need to preserve and deliver ever increasing numbers and varieties of electronic records. These initiatives aim at bringing high performance, advanced computational capabilities to ensure the integral preservation of complex collections of electronic records indefinitely into the future. NARA is seeking solutions which not only overcome problems of technology obsolescence, but also enable both repositories and their customers to take advantage of continuing technological progess for information discovery, retrieval and delivery. These efforts are looking at technologies that are seen as key to the next generation national information infrastructure as the foundation for the development of a knowledge-based persistent object preservation method applicable to digital libraries, data centers, and a variety of business operations as well as archives. NARA is pursuing these goals through several multi-disciplinary, international research and development collaborations. Among others, NARA is a major sponsor of the International Research on Permanent Authentic Records in Electronic Systems (InterPARES) project. NARA has also become a co-sponsor, with the National Science Foundation, of the National Partnership for Advanced Computational Infrastructure.
Significant developments are now happening in a number of middleware areas, including directories, interrealm authentication, and PKI. Taken together, these activities have the potential to resolve the long-standing problems in the practical deployment of digital materials. This session will review the status of the Eduperson standard, an objectclass designed to facilitate inter-institutional applications. On that base, we have started to investigate the issues in a directory of directories for higher education, both on national and international scales. Using Eduperson attributes, we are designing an architecture, and public domain implementations, for interrealm authentication and basic authorization. This Shibboleth project will permit one to "authenticate locally and act globally". Lastly, PKI activities within higher ed have accelerated. Discussions will focus on these efforts and their value to the DLF community.
John Mark Ockerbloom, University of Pennsylvania
Mike Winkler, University of Pennsylvania Library
Millions of dollars spent on library resources can still be worthless if users cannot find the resources they need. Unfortunately, the digital and print resources of libraries are often over-compartmentalized, with print resources handled separately from digital resources, and digital resources organized by inflexible hierarchies and divisions that effectively hide from users many of the resources that would be useful to them.
In this talk, we will describe how we've addressed this problem in Penn's library. We are building tools and organizational structures to make materials easier to find, such as web-accessible, multidimensional databases of our resources, and "communities of interest" that serve as cross-format, research-oriented starting points for finding materials. We will show the tools we have built and are developing, and describe how we are designing our digital library architecture to support dynamic organizations of library resources that adapt to the needs of different users and groups.
Abstract to follow.
Columbia University's digital library program supports a wide range of collections and customers, including many not traditionally associated with the library. This session will describe our architectural approach to the broad scope of content areas and access issues, including distance learning and cost-recovery projects. Our technical framework addresses many dimensions of digital library activity simultaneously, such as preservation, citation, access management, discovery and the user's experience. As another work in progress, we look forward to your comments and discussion.
The University of Texas at Austin General Libraries is proceeding with a project to create an interim system to manage its burgeoning electronic resources, including e-journals. The presentation will provide some background illustration of the current problems being faced, the various methods by which staff have attempted to deal with them, and the current project that will hopefully bring a unified, albeit temporary, solution to the problem.
Many libraries are adding ever-increasing numbers of electronic resources to their holdings. Since most "home grown" and commercial library automation environments have been engineered to manage print-based collections there are often many problems associated with the "handling" of these materials. At UT Austin, the process of managing access to these resources involves static and database-driven web pages, our OPAC, and our acquisitions system. This creates multiple points of management for catalogers, library web managers, serials coordinators, and anyone else involved in providing access to electronic resources.
Local cataloging procedures include editing local MARC records with URL's as well as downloading and editing OCLC MARC records where no local one can be assigned. An additional procedure undertaken by our collections officers calls for maintaining several sets of web pages with links to the same resources as are found in our OPAC. As these web pages cannot be generated directly from the OPAC information, separate databases are maintained in order to produce these web pages as well as to track the license requirements for each individual product. Yet a third procedure involves the maintenance of a finding tool (yet another database listing the individual titles found in the full text electronic resources) by the reference staff.
The proposed solution is to develop a unified electronic resources management system to obviate and somewhat automate the need for updating several separate databases as well as responding to the need for access to these services from multiple locations. This system will prove useful, even if only used temporarily to bridge the gap until vendors respond adequately to this need. With a small extension, the same system can provide subject indexing and eliminate the need for static subject pages maintained by individual staff members.
It is hoped that this system will also be extensible to possibly include other UT System component holdings.
After investigating existing digital library "solutions", the University of Virginia Library ascertained that an appropriate system did not yet exist. We decided to start building a system that would fit our needs and a vendor would come (you remember that movie, don't you?) to take over the development of a complete, manageable product. We are definitely not interested in being a vendor ourselves. We believe that we have completed the first phase of this process on both fronts. We have implemented the FEDORA architecture, developed by the Digital Library Research Group at Cornell, using an SQL database. We have built the system and have implemented a testbed of about 30,000 objects at this point, including digital images, electronic texts, electronic finding aids, and XML-encoded structural metadata objects that organize collections of art, archeology and architecture data. We have designed, but not yet implemented, object models for statistical and GIS datasets, and we have begun thinking about how to include journals and databases for which we negotiate access. To roughly scale-test the system, we added approximately 1 million dummy objects, and were very happy with the results. Beginning in the spring of 2000, we started discussing our system design with SIRSI Corporation. We have completed several rounds of discussions with the principals of the company and we are optimistic about the possibility of forming a development partnership with them. Our goal is to develop a system that manages and delivers all of our digital resources.
The University of Michigan, Oxford University, and the Council on Library and Information Resources have encouraged the research library community to fund creation of structured SGML/XML text-files for a significant portion of the Early English Books Online (EEBO) corpus of digital images created by Bell & Howell Information and Learning. These full-text editions, linked to the corresponding digital facsimiles of the works, will enable word or phrase searching across the corpus, significantly extending intellectual access to the content and making possible new avenues of historical research across a broad range of disciplines. The significance of the EEBO corpus (English language works--as well as those in other languages printed in England--from 1475-1700), along with the scale of the digital conversion in both image and text, make this the most ambitious conversion effort undertaken to date. The EEBO-Text Creation Partnership (EEBO-TCP) was proposed as the best means of addressing the following goals:
This paper traces the development of this groundbreaking project, with particular emphasis placed on the nature of the business and marketing plan and the likelihood that EEBO will become a model for future commercial efforts to digitize large-scale corpora in cooperation with the library community.
The CIC Wright American Fiction project is sponsored by nine member institutions of the Committee on Institutional Cooperation (CIC)*. This project will digitize almost 3,000 American novels from the years 1851-1875, as described in Lyle Wright's bibliography _American Fiction 1851-1875_. The participants will pay collectively for digitizing the page images from microfilm, with Indiana University converting the page images to text files using Optical Character Recognition (OCR) software and creating a searchable database similar to the Making of America collection.
Starting this fall, staff at participating libraries will begin the work of proofreading, editing and encoding the approximately 1,000,000 pages in the collection. We hope to achieve cooperatively the creation of a very large, consistently encoded textbase. To this end, we have developed encoding guidelines and held training workshops for participants. It is projected that this project will take three years. We hope to accomplish three goals:
* Participating CIC libraries include Indiana University, Michigan State University, The Ohio State University, Pennsylvania State University, University of Illinois at Chicago, University of Illinois at Urbana-Champaign, University of Iowa, University of Michigan, University of Minnesota, University of Wisconsin
Abstract to follow
Overview and discussion of key elements of the Collaborative Digital Reference Service (CDRS) a project being launched by the Library of Congress and member libraries to provide professional reference and research services to researchers any time anywhere, via an international, digital network of libraries and related institutions. The service will use new technologies to provide the best answers in the best context, by taking advantage not only of the millions of Internet resources but also of the many more millions of resources that are not online and that are held by libraries around the world. CDRS supports libraries by providing them additional choices for the services they offer their end users. Libraries will assist their users by connecting to the CDRS to send questions that are best answered by the expert staff and collections of CDRS member institutions from around the world. Local, regional, national, and global: the library tradition of value-added service will be the CDRS hallmark.
Over the past several years, the California Digital Library has made virtual collection development a priority for the Online Archive of California. A statewide digital resource, the OAC integrates into a single, searchable database, finding aids to and digital facsimiles of the contents of primary resource collections throughout California. In this session, we will explore the lessons learned and consider the challenges ahead by reflecting on several OAC grant funded projects including the UC-EAD project, the Museum Online Archive of California, the Japanese-American Relocation Digital Archive, and the California Cultures project. We will specifically discuss collection development strategies, project leadership, selection mechanisms for primary content, and compliance with metadata and imaging standards with an eye to statewide collection building within a collaborative framework.
In 1999 the University of Georgia Libraries, and the University of Tennessee Library were awarded a one-year National Leadership Grant from the Institute of Museum and Library Services to digitize 1,000 original documents and visual images relating to the Native American population of the Southeastern United States. These documents and images were selected from the most significant holdings at the two universities and from the Frank H. McClung Museum at the University of Tennessee and the Tennessee State Library and Archives in Nashville. The original documents reside in many separate manuscript collections within participating institutions, but as digital entities they are being brought together into a single electronic collection. In September of 2000 the universities were awarded a one-year extension of funding by IMLS, in which they will digitize another 1000 documents and visual images from their collections, from the collections of the participants in the first grant, and from the Museum of the Cherokee Indian in Cherokee, North Carolina, and the Tennessee State Museum in Nashville. The final product will be a database of facsimile images and transcribed texts, individually cataloged and full-text searchable, mounted on GALILEO (Georgia Library Learning Online). The database will be free and available to the general public, but the primary audience is K-12 teachers, who can search, download, print, and make lesson plans as desired. A prototype database should be available by November of 2000.
Books are easy! Ninety-five percent of them exist in multiple copies and are now easily accessible through international databases such as RLIN. It is the scholarly resources hidden in archives that we need to make more visible. (David Stam, Syracuse University Librarian emeritus http://www.rlg.org/arr/).
For archives, manuscript repositories and libraries holding unique materials, digitization offers a promising means to expand access and reach new users. However, most digitization projects (particularly those involving scanning or mounting of images or full text) are expensive and technically complex. Consequently, large scale-digitization using professional standards seems out of reach for many smaller institutions, and even for larger ones that face budgetary pressures.
This paper will report the early results of a project at the University of Illinois Archives to construct a cost-effective digitization model for the James B. Reston Papers using professional metadata standards (the Encoded Archival Description XML mark-up language and an XML "page turner" called Ebind) and other open-source or low-cost tools. It is intended that all of the tools developed for this project will be made freely available by the time of project completion. They could easily by adopted by other institutions with very little modification and by archivists or librarians with little or no technical expertise in programming, web-site creation, or database management.
After reviewing the provenance and significance of the Reston Papers and the history project, this paper will:
In sum, this paper is intended to shed light on practical elements of policy decisions regarding a hybrid collection, especially in the area of selection and resource allocation.
The Digital Library Initiatives Department at NCSU Libraries is developing a Digital Media Collections Frameworkto address and enable the creation of desktop-to-enterprise collection databases by university students, faculty, and staff. The Libraries understands the inherent value and requirements of a database as an information access and management tool. As the demand and desire grows to build personal research, departmental, or university digital collections, this tool can provide a level of service to the university community that recently only happened within the one-to-one service model. Developed as a result of interviews and on-going projects with university faculty and partner agencies, the Digital Media Collections Framework organizes database product research with the additional goal of sharing federated collections with constituents of the NCSU digital environment.
The strategic foundation of the Digital Media Collections Framework is the sharable Desktop Database model. Five component parts are being developed within this model: Media Description Framework, Media Guidelines, Database and Query Template, Vocabulary Control application, and Media File Naming and Directory Structure Guidelines. During the presentation, the component parts will be described and a current test case will be demonstrated.
There's been tremendous buzz about ebooks over the past year, but even the most ardent enthusiasts are lamenting the lack of available ebook content. The DLF libraries can play a big role in filling that content gap, as much of their digital content can be converted quickly and easily for delivery to multiple ebook platforms. The University of Virginia's Electronic Text Center has plunged into the ebook fray headfirst, converting over a thousand SGML-encoded texts to the OEB format and delivering them over the web as freely-accessible Microsoft Reader .lit files (with several additional formats forthcoming). These ebooks have been quite a success -- over 300,000 books were downloaded in the first 50 days of availability -- and have clearly expanded the audience for our existing collections.
This presentation will describe our experiments in building an ebook library at Virginia: our selection criteria, our methods of conversion, the technical barriers we encountered, and the statistics and user feedback that we are continuing to gather. It will also discuss the ways in which currently evolving ebook standards have ignored some issues of importance to the academic community, and will stress the need for research libraries to get involved in these standards efforts while they can still make a difference.
The DLF is involved in research and development work in six program areas briefly identified as digital library architectures, collections, standards and best practices, use, preservation, and institutional policies and strategies. It is acting as a catalyst in the development of innovative digital library collections and services. Lastly, it is developing a communications arm that will help professional staff at member institutions to share information about and keep up-to-date with digital library developments as they occur. The presentation will review the work of the DLF since the Forum last met in March 2000 and seek input from participants about directions the organization should try in the coming months.