In the face of rapid change, digital libraries are increasingly being required to deliver new, useful services to their customers. These services are being delivered via the web, leveraging a variety of modern technologies. With increasing data volumes and more distributed delivery requirements, these services need to be scalable, reliable, and cost effective. Amazon Web Services enables application developers to focus on delivering quality services by providing a highly scalable platform built upon lessons learned from operating one of the worlds largest and most reliable web operations. AWS is an ideal platform that not only provides you elasticity to meet demand and grow as needed, but also an architecture that allows you to build highly available services that fulfill the needs of digital asset curation, delivery and storage. We will discuss some of the core principles of Amazon Web Services, advantages of utility computing and examples of how researchers and service providers can derive great benefits.
The evolution of a networked society implies the modification of production processes, cultural practices, political organizations, and social institutions. Leading from that, we offer the idea of networked learning to suggest multi-modal and multi-nodal learning that transpires across diverse physical and virtual environments, much of which is mediated by networked computing and digital media. In this view, geographic location, demographic identification, and organizational affiliation are becoming less relevant to the production and consumption of knowledge, leaving the future of learning influenced more by an individual’s technological connections, personal motivations, and informal interactions than it ever has been in the past. And, while we have begun to understand better the digital media practices of youth, today we still know very little about the actual (let alone potential) impacts of digital media on the educational programming of schools, libraries, museums, and science centers. More importantly, we know little about how these different institutional spaces currently interact or counteract with each other or the digital lives of their youth constituents to effect learning. Achieving networked learning between and across these institutional spaces as well as digital sites requires a view of how students currently weave together these various learning environments and a vision of how social and technical infrastructure could help them link the access, content, and experience of these environments better in the future. This talk will present such a vision of networked learning for 2020 and will describe our three-part research, design, and innovation strategy for turning that vision into action in New York City.
Building a culture of collaboration (Christenson and Wilkin): A defining aspect of HathiTrust is its instantiation as a single entity governed by a community of libraries. HathiTrust seeks effective ways to share work across our institutions. We will discuss both current efforts and proposed models for collaboration.
Large-scale search (Burton-West): How feasible is it to support full-text search over millions of volumes using open source tools? The presentation reviews a benchmarking strategy using Solr and the data collected on variables such as memory, CPU, shards, and load. The presentation will explore strategies for deploying a production version of full text search in HathiTrust.
Large-scale ingest (Feeman): Supporting ingest at a rate of hundreds of thousands of digitized volumes per month must take into account throughput, validation and ‘package' transformation to the repository format. The design of GROOVE will be discussed, as well as problems encountered and their solutions.
Copyright review and the CRMS (Karle-Zenith): HathiTrust undertakes a process of making copyright determinations and sharing those determinations. Additionally, IMLS has funded the development of a Copyright Review Management System. Thousands of 1923-1963 US works have been released thus far. This presentation will report on the process of making copyright determinations and a status report on the CRMS.
Accessibility (Chapman): HathiTrust is now making works (including those that are in-copyright) available to persons with print disabilities through screen readers and digital Braille devices. We will review the mechanisms for doing this work and the usability testing performed to aid in its design.
This session will report on assessment of American Social History Online, including use by students, faculty and graduate students, observations about the collections included, the features and functions of the web site, and the integration with Zotero. The presenters will also demonstrate the MODS explorer and discuss results of testing for compliance of incoming metadata records with the Guidelines for MODS Implementation developed by the Aquifer metadata working group. As DLF Aquifer comes to an end, the presenters will share ideas for best practices in collaboration learned during their work together.
In the Fall of 2006, the University of Illinois Library at Urbana-Champaign began a partnership with the Open Content Alliance (OCA) as part of a broader large-scale digitization initiative. In concert with this new partnership, a primarily automated workflow was established to maximize online access and exposure for Illinois books digitized by OCA. This work paralleled efforts to enhance visibility of other locally digitized collections. Multiple approaches were developed to increase the visibility of digital surrogates: disseminating metadata about digitized content via a range of systems such as our local OPAC, OCLC WorldCat, our local institutional repository, the locally-developed Illinois Harvest Portal, an OAI-PMH Data Provider; creating splash pages for digitized volumes at the item level and serial level; providing unique, persistent handle URIs for each book digitized from Illinois collections. The workflow was also adapted to link print resources indexed in the Library OPAC with the publicly-accessible digital copies of books digitized by OCA or deposited in Hathi Trust. Allowances were made in the workflow for specialized processing of some resources to meet users' needs in certain scholarly context -- i.e. implementing Mets Navigator (a METS-based page turner application developed by Indiana University) for our digitized brittle books and for our Project Unica collection (a rare books digitization project). This presentation will discuss workflow strategies and methods used to provide access to digitized content, and will describe motivational use case scenarios.
Libraries and archives have important primary source material that exist only as black and white and color negatives. Many archives have resisted digitizing and creating access to these materials. Some have digitized the materials with disappointing results. Unlike most archival material, negatives are not the “thing of record” but have a relationship much like the score (the negative) to the performance (the print or positive digital file). At NYU Libraries we have had the opportunity with several of our negative collections including Tamiment Library and Robert F. Wagner Labor Archive's Abraham Lincoln Brigade to explore this paradigm for black and white negatives and to expand this discovery process into color negative holdings. Sensitivity to the different nature of negatives and new tools available such as DNG and 16 bit work flow tools have changed our ability to convert these hidden assets into preservation masters and access copies. As film stock ages, fades and becomes embrittled it is important to start processing these collections and opening the contents, in a respectful and visually acceptable way, to the light of day. Our surprising finding is that this can be done in a systematic and efficient way.
A joint effort by Cornell University Library and Cornell faculty in Classics is exploring the extention of OpenURL to provide system independent linking between citations of Classical literature and an increasing array of available online resources and services in Classics. Textual references in Classics are commonly to FRBR work-level entities (e.g., Ovid, Amores), independent of any particular edition or translation. Online abstracting and indexing services, as well as Classical scholarship available online, contain many such citations. At the same time, there are an increasing number of Classical texts and resources available online. This project has explored the advantages and challenges of building links among such resources using OpenURL. Such linking, which could be extended to other domains, will allow more seamless movement from scholarly resources to original texts and translations, improving digital services in the humanities. This project has been supported by a recent planning grant from The Andrew W. Mellon Foundation to the American Philological Society. To date, project work has focused on the creation of a canonical citation OpenURL metadata format, a strategy to address implementation challenges, and a prototype of a Classical literature knowledge base and linking system. By design, the metadata format, implementation scheme, and knowledge base structure are independent of Classics and can be deployed in other domains that frequently cite texts independent of editions or translations. The project also demonstrates how knowledge bases may be chained together to provide enhanced services to users, a model which may have wider application within the OpenURL community.
The University of Minnesota is developing EthicShare, an open virtual research community for scholars in the field of bioethics, funded by the Mellon Foundation. The site's development process has resulted in an environment, accessible by scholars regardless of institution, which combines a repository of sources (traditional scholarly materials, relevant popular press resources and other related content) with social, collaborative tools and features.
Thus far, social networking services have only been selectively evidenced among scholars, yet these environments hold opportunity to enable collaboration and, potentially, advance new forms of scholarship. But, while possibilities exist, there are many barriers involved in the creation of inter-institutional research sites. Copyright and licensing issues surround the citations and full text of the aggregated scholarly resources. The current behaviors and motivations of bioethics scholars, especially those surrounding the tenure and promotion process, inhibit the adoption of social spaces. Additionally, their unfamiliarity with social networking technologies requires education and policy creation.
We will discuss the extensive assessment of bioethics scholars that determined both the appropriate content and the collaborative features we incorporated into the site. Our strategy, an iterative process of assessment and development, will be explored. Finally, we will outline the development of the Drupal based tools and features within the EthicShare site that ensure scholars from all institutions can participate. This will include the Link Resolver module built to blend network provisioned data from the WorldCat Registry service with locally stored user configuration information.
Copyright is a tricky subject for digital libraries. As libraries, we want to promote the rights of users and educate the public. As digitizers, we rely on fair use and the public domain to create and share our collections. As owners of digital content, we want to protect our investment, receive credit for our work, and prevent commercial (ab)use of our materials. Attaching copyright statements to our digital collections is one of the easiest ways for us to achieve these goals. Unfortunately, digital library literature is silent on the subject, and consensus has yet to emerge on what these statements should say, where they should be placed, or even what they should be called. As a result, practice varies widely within and between institutions, and statements are often misleading, inaccurate, or nonexistent.
This presentation will share the results of a study of copyright statements attached to digital collections created by DLF member institutions. The questions posed by the study include: Do such statements exist? What kind of content do they include? Do they provide an accurate representation of the copyright status of the items? The study also addresses how well we are fulfilling our obligation to educate our users on their rights under copyright law, including fair use and unrestricted use of public domain materials. The presentation will conclude with recommendations on how to address the problems highlighted by this study and develop best practices for copyright statements attached to digital collections.
The Hub and Spoke (HandS) Project, one of four UIUC-based technical architecture projects funded by the Library of Congress' NDIIPP program, proposes a paper on our framework for repository interoperability and preservation. The paper will have four parts. The first part will introduce the framework. The second part will provide an overview of our METS profiles and examples of HandS preservation packages. The third will be a demonstration of our workflow manager client application, supporting submission and retrieval of digital packages between the repositories in our system, and a discussion of the technical protocols involved in the HandS workflow cycle, such as our "Lightweight Repository Create, Retrieve, Update, and Delete Service" (LRCRUD) and the Simple Web-service Offering Repository Deposit (SWORD). The final part will address the metadata interoperability layer; in particular, we will present and analyze the challenges we faced in crosswalking MODS metadata to the Scholarly Works Application Profile (SWAP), the FRBR-based metadata format used by SWORD.
EUI, the Ethnography of the University Initiative (http://www.eui.uiuc.edu/), is an innovative program at the University of Illinois that offers students the opportunity to conduct original ethnographic and archival research and archive it for future students to build upon. EUI supports faculty in their efforts to bring the research discovery process into the classroom and works with IDEALS, the University's institutional repository, which maintains a permanent online archive of student research. In this session I'll describe EUI and discuss the roles of participants, give examples of student learning and research enabled by EUI and IDEALS, and reflect on what we've learned from the initiative.
The task of preserving our digital heritage for future generations far exceeds the capacity of any government or institution. Responsibility must be distributed across a number of stewardship organizations running heterogeneous and geographically dispersed digital preservation repositories. For reasons of redundancy, succession planning and software migration, these repositories must be able to exchange copies of archived information packages with each other. Practical repository-to-repository transfer will require a common, standards-based transfer format capable of transporting rich preservation metadata as well as digital objects, and repository systems must be capable of exporting and importing information packages utilizing this format.
The Text Encoding Initiative Guidelines for Electronic Text Encoding and Interchange (TEI), first published in 1994, quickly became _the_ standard for encoding literary texts. The TEI was widely adopted by libraries for its promise of discoverability, interoperability, and preservation of electronic texts, but the TEI's monolithic nature inspired the codification of library-specific practice. Since 1999, libraries have relied on the TEI Text Encoding in Libraries Guidelines for Best Encoding Practices (http://www.diglib.org/standards/tei.htm) to steer their work with encoded texts. In April 2008, the TEI in Libraries special interest group (SIG) and the DLF-sponsored TEI Task Force partnered to update the Guidelines. The revision was prompted by the release of P5, the newest version of the TEI, and the desire to create a true library-centric customization not constrained by the TEI Lite schema.
The revised Guidelines contain updated versions of the widely adopted encoding 'levels' - from fully automated conversion to content analysis and scholarly encoding. They also contain a substantially revised section on the TEI Header, designed to support interoperability between text collections and the use of complementary metadata schemas such as MARC and MODS. The new Guidelines also reflect an organizational shift. Originally authored by the DLF-sponsored TEI Task Force, the current revision work is a partnership between members of the Task Force and the TEI Libraries SIG. As a result of this partnership, responsibility for the Guidelines will migrate to the SIG, allowing closer work with the TEI Consortium as a whole, and a stronger basis for advocating for the needs of libraries in future TEI releases.
If you work with encoded text or simply want to learn more, please join us for the TEI Text Encoding in Libraries birds of a feather session. We will provide an overview of the Guidelines and the principles that governed our revisions. We will also seek feedback on the work we have done so far and solicit input for future planned revisions.
As online teaching and learning activities on our campuses move increasingly into course management systems (CMS) such as Blackboard, Sakai, Angel, Desire2Learn, and Moodle, many libraries and librarians are looking at how best to expose their collections, services, and expertise in these environments, and there have been several presentations at past DLF Forums on this topic. These efforts often involve work across instructional technology, reference, and library systems groups, and can pose challenges related to both technical issues and organizational culture.
As one example of such an effort, the Sakaibrary project, a joint effort between Indiana University and the University of Michigan with past support from the Andrew W. Mellon Foundation, has implemented functionality in the open source Sakai course management system to allow faculty and students to create and share reading and reference lists and to search for licensed and open full text resources using library metasearch tools and Google Scholar. The project has also created a prototype tool to allow librarians and faculty to create ‘research guides' within Sakai to provide focused access to resources and services relevant to a particular course, discipline, or research task.
The purpose of this BOF is to get together librarians and technologists who are working on or interested in CMS-library integration issues to learn from each other through informal discussion and sharing of use cases, experiences, plans, and ideas.
Vice Provost and Director of Libraries Susan K. Nutter invites you to a cocktail reception May 5, 2009, 5:30-7:30 p.m. in the D. H. Hill Library Special Collections Reading Room on the campus of North Carolina State University.
RSVP by noon on Friday, May 1st, to Terry Hill at 919-515-7188 or terry_hill "at" ncsu "dot" edu
Transportation will be provided to and from the Raleigh Marriott City Center. The first bus will depart from the hotel at 5:15 and the second bus will depart at 5:45.
This event is sponsored by The Friends of the Library of North Carolina State University.
This session offers a simple framework for evaluating data storage solutions for long-term digital storage. We start with the premise that there is no one-size-fits-all data storage solution, and that each insitution's needs are potentially different from it's peers. Next we observe that the data storage industry is very confusing. There are hundreds of vendors each offering unique twists on disk, tape, and optical media. They all have their reference accounts. They all have polished sales presentations that make their products sound like the best. They all offer special terms to win you as a client. How do you identify what really matters for your specific needs and avoid some common pitfalls?
Digital Librarians' efforts to preserve digital collections for future use include archiving the digital objects and associated metadata into preservation repositories. The integrated Rule Oriented Data System (iRODS) provides for the enforcement of preservation policies such as replication of collections so that there is no one point of failure. iRODS combines information models from the digital library community with archivists' preservation models and archival storage technology from the data grid domain.
The speaker will discuss federated preservation data grids and demonstrate a prototype application from an ongoing project to use the OAI-PMH to transfer digital objects and metadata from the Odum Digital Archive at UNC-CH into the National Archives and Records Administration (NARA) Transcontinental Persistent Archive Prototype (TPAP) preservation grid, an extension of which uses iRODS. The transfers implement community policies for the generation of descriptive metadata, choice of semantics, and access permissions.
The success of this proof-of-concept means that any digital library or archive that is an OAI-PMH-compliant Data Provider can upload their collections into the preservation data grid, and demonstrates that the iRODS software is flexible enough to provide not only "dark" archival storage and enforcement of domain specific policies, but dissemination and data transfer as well.
The Extracting Metadata for Preservation (EMP) Project, funded by the National Digital Information Infrastructure and Preservation (NDIIPP) Program, addresses the ongoing challenge of identifying proper names to improve authority control in metadata creation and extraction, as well as accuracy in end-user information access via web-based search and retrieval. As a collaboration among the University of Illinois at Urbana-Champaign, OCLC, and the University of Maryland, EMP researchers bring multidisciplinary perspectives from the library, computer science, and linguistics communities to the problem of high-quality identification and disambiguation of names.
This presentation reports on three activities. First, we describe an open-source name extractor tool developed by computational linguists at Illinois, configured with a plug-in interface that lowers barriers of access to state-of-the-art research tools. Second, we demonstrate the use of this tool by integrating it into two applications developed at the collaborating institutions: summary views of FRBR-ized MARC records hosted at OCLC and metadata generated by CLiMB (Computational Linguistics for Metadata Building) at Maryland. Finally, we describe the results of evaluation that compares the output of EMP with previously available solutions.
This research will be of interest to those who develop search interfaces, metadata creation tools, institutional repositories, and applications requiring names management.
Historically, controlled vocabularies maintained and provided by the Library of Congress have proven difficult to access and process. Some vocabularies, like the LC Name Authority File and LC Subject Headings, have required substantial payment to simply access the data. Others, while freely available, have only been provided within simple lists that lack web addressability for the values within the vocabulary. Both approaches required human intervention to make use of the data.
In Spring 2009, the Library of Congress will launch a new service called id.loc.gov to expose its controlled vocabularies and the values within them as first class web resources. To drive this application, LC primarily uses Simple Knowledge Organization System (SKOS) Resource Description Framework (RDF) metadata. Following the principles of Representational State Transfer and the Linked Data movement, each vocabulary and every value within will be addressable as dereferenceable HTTP URIs to provide the machine readability that has long been requested by developers. This new functionality will allow users to tie LC vocabularies and individual terms directly into their metadata. These URIs will allow for HTTP content negotiation so that machines and human users alike can access a suitable format of the data. If the data at hand doesn't suit the needs of the developer, one can perform custom queries using the site's SPARQL endpoint. Alternatively, one can freely download the data for free in numerous RDF formats, and process the data locally.
In February 2009, Indiana University released Variations, an open source software package that helps libraries provide online access to streaming audio and scanned score images for teaching, learning, and research, with support from a grant from the Institute of Museum and Library Services. The Variations system, currently in use at Indiana University and four other institutions, provides a repository for storing audio files and score images, tools to assist library staff in ingesting audio and score content, and end-user tools for delivery, annotation, and pedagogical use of music content. A key feature for libraries is a flexible access control and authentication system, which allows libraries to integrate with existing local authentication and authorization systems and to set up access rules based on their own local institutional policies. The system, written primarily in Java and distributed under a BSD-style license, makes use of a number of other open source tools, including Sun's MySQL database and Apple's Darwin Streaming Server. Currently, the end user tools are provided as part of a Java Swing desktop client for Windows and Mac OS X, but the most commonly used tools are being ported to browser-based Web applications.
In this presentation, we will provide an overview of Variations functionality and system architecture and discuss what is required to bring up and support the system at an institution. We will also talk about options currently under review for ongoing support and development of the system and raise some general issues about sustaining discipline-focused open source software.
By invitation only.
Contact Nancy Hoebelheinrich at nhoebel "at" stanford "dot" edu if interested.
return to top >>
Copyright © 2009 Digital
Library Federation. All rights reserved.