The Developers' Roundtable (formerly Developers' Forum) is intended for
technology developers, technical managers, or those representatives that
have influence in decision-making at their institutions. This gathering
provides dedicated time at the DLF Forum to meet together to share
problems, ideas, and solutions of a technical nature.
In the first segment, from the safety of their seat, every attendee
delivers a brief (2-5 minutes) informal statement giving consideration
to each of these topics: (a) cool new technology of interest,
(b) over-hyped technology, and (c) opportunities for collaboration.
There will be a short time for discussion after each turn. In the
second segment, two special topics will be presented.
+ Using S3 and EC2 Amazon Web Services (AWS): A Case Study in Progress
Kris Carpenter Negulescu, Internet Archive
The Internet Archive has spent the better part of the past year
experimenting with large scale storage and data processing via AWS.
The talk will describe the effort to make the years 1996-1999 of our
web archives full text searchable (the "20th Century Find" service).
This includes why IA chose to work with the AWS alpha and beta
offerings, the key strengths and weaknesses of each service, what's
missing, and where we see AWS services evolving from here.
+ N2T Name Resolver Update
Rasan Rasch, New York University
This is a brief followup on a thread that grew out of the Developers'
panel on global identifier resolution at the Spring 2006 DLF Forum.
It will describe an active experiment to create a handful of mirrored
resolvers (California, New York, and Goettingen for now) addressing a
superset of the Handle/URN problem but with far simpler technology,
with results discussed this week at a German workshop (Trustworthiness
and Interoperability of Persistent Identifiers and Resolvers).
In the final segment, the group considers selection of a topic and a
temporary co-chair for a technical session and birds-of-a-feather (BOF)
meeting at the next main DLF Forum. If selected, the temporary co-chair
organizes and facilitates at that DLF session and BOF.
There's something very different about moving images. They're expensive to process and difficult to expose to users. They require us to engage with both new and obsolete technologies. They pose mysterious and intimidating rights issues. And they're multiplying rapidly. Special collections and archives are filling up with film, video and digital media, and most born-digital video isn't even being collected. And when it comes to providing access, we're losing the battle.
How can 21st-century archives work productively with these materials without repeating past mistakes? Most importantly, what does the history of archival engagement with moving images teach us about the future of archival access and our relationships with our users?
With an 800 (billion?) pound Gorilla's entrance into the world of book publishing and libraries over two years ago, there has been considerable discussion among publishers about their digital strategies and how to capitalize on this potentially lucrative new vehicle for their content. While it's still far from clear as to how lucrative all of this talk will turn out to be, what has become clear amidst this frenzy of digitization activity is that the need for standards and structured metadata is more critical than ever before. This report will discuss some efforts to bring common ground to how digital content should be identified and how its metadata can be maximized to bring an optimal discoverability experience.
Mass digitization with private sector partners is presenting libraries with an incredible opportunity. The terms of those partnership agreements vary. This panel will explore differences in those agreements and the impacts they may have. Moderator will report briefly on the findings of an investigation into public/private mass digitization deals, which was launched by a meeting of many of those involved in negotiations and involved an analysis of the publicly available agreements. The report will appear in the Nov/Dec issue of D-Lib. (Preprints will be available.)
The panelists will respond to a series of questions aimed to elucidate the pros and cons of different arrangements, with an eye toward stewardship responsibilities and the community benefit that can be realized with large numbers of digitized books. Sample questions: How long is the term of your agreement? How much do you expect to have digitized at the end of the term? What are the restrictions on your provision of access to your content? What restrictions survive the term? Are you allowed to contribute your digitized content to other aggregations? Are there related restrictions? Can others index your content and provide services to users? Are records for the digitized books being shared with union catalogs? Were you allowed to collaborate with others during the negotiation? This will not so much be a history of how these projects came about and what's been done so far, but an opportunity to surface the good that is coming of these partnerships as well as the impact of compromises.
Since the emergence of digital libraries and archives a decade ago, a number of projects have created useful systems and communities that use and support them. More recently, a number of new non-profit organizations have emerged to lead these communities and continue developing the systems they depend on. The panel will review four well-known projects and the organizations that have emerged to support them, and their directors will describe what these organizations do for their communities, and how they are managed and funded. Discussion will be about how these organizations might evolve and what the larger digital library community wants from this new breed of organization.
Richard Akerman will discuss more on a technical level, and Chris Mackie will present from a high-level policy approach. Discussion of topics: Are there any common services out there that can be shared (e.g. OCLC xISBN)? Where are there gaps that could be filled with a services architecture, and how can we move beyond frameworks to running code? What is the relationship between library catalogue services and other research library services, e.g. article repositories? What experiences (positive and negative) have other organisations had with SOA? How are people using the work produced by the DLF architecture group? Are there barriers to adoption that the DLF and other organisations could help to overcome? What should the DLF's role be?
Services-oriented architectures (SOA) and the closely associated concept of Enterprise Services Architectures (ESA) are currently topics of much discussion and exploration across higher education. It is particularly important that librarians seeking to understand the potential impact of SOA/ESA on their own information systems understand how those concepts are changing views and strategies in the rest of the higher education enterprise as well. The proposed paper will provide an overview of the concepts and the rationales behind them, and survey current SOA/ESA activity in higher education, focusing especially on conceptual and strategic differences between commercial and open source approaches. After surveying some of the principal services-oriented systems projects currently underway, it will discuss the potential impact of SOA/ESA on the integration of library systems with the rest of the higher education enterprise, and sketch some opportunities and challenges that are likely to arise. The author's experience is on the IT side of the IT-Library divide, so the paper will emphasize "the view from IT" as, one hopes, a useful counterpoint to views of SOA/ESA from inside the library.
Open Archives Initiative Object Reuse and Exchange (OAI-ORE) defines standards for the description and exchange of Compound Information Objects on the Web. These objects are compositions of related web resources that together form a logical whole. Examples are scholarly publications from ePrint or institutional repositories, information from social networking cites that wrap primary sources (e.g. images, presentations) with annotations and reviews, and products of eScholarship that mix text, images, simulations, and data. ORE standards build on web architecture principles. They, therefore, can be leveraged by both repository systems and standard web clients and agents such as crawler-based search engines. This talk will describe the latest results of OAI-ORE work. These include a draft specification of the OAI-ORE data model, which models compound information objects as named resources known as Resource Maps (ReMs), and describes how these ReMs provide access to information about the compound object. The results also include a draft specification of an ATOM-based implementation of the ORE data model. Providing an implementation of the data model in ATOM will make the ORE standards available to a range of standard RSS/ATOM-based applications and clients. The talk will also introduce use cases illustrating possible applications of the OAI-ORE work. A eChemistry project will deploy an OAI-ORE-based infrastructure for exchange of molecular-centric information and the linkage of that information to researchers, experiments, publications, etc. A digital preservation prototype will illustrate how the Web-centric approach of OAI-ORE could empower the Internet Archive to readily archive compound information objects.
Providing open access to scholarly work is an important digital library initiative. To improve our understanding of the self-archiving practice of campus faculty and the opportunity to self-archive in different disciplines, the University Libraries is conducting a study of faculty publication lists available on the web. In phase I of the study, the lists are analyzed to identify publication and access type. In phase II, all of the journal publications are analyzed to determine whether the work was or could have been self-archived in compliance with publisher policy. The findings to date are intriguing. Surprisingly little gray literature is self-archived. Even fewer journal articles are self-archived, and many of those that are breach publisher policy - though policy would allow self-archiving of at least half if not substantially more of the articles. The data suggest that there is no correlation between the opportunity to self-archive and faculty practice. Apparently faculty either do or do not self-archive, and many do not know or care about the details of publisher policies regarding the version that can be archived, embargo periods, or required text or links. It is likely that the variation in publisher policies and the requirements to change the descriptive text or online version after publication or to delay self-archiving are too complicated or time-consuming to encourage or secure total compliance. The data from the study will be used to inform faculty of the opportunity to self-archive in their discipline and to spark discussion of why and how to self-archive.
In recent years, libraries and archives have made tremendous strides in making print and image collections available digitally. Where are we with our moving image collections? This two-part panel aims to confront a series of interrelated questions on digitizing and born-digital moving image collections in order to catalyze discussion and planning among libraries on this critical yet under-examined area. How does the cultural prominence of web sites like YouTube impact how libraries and archives collect, preserve, present, and deliver moving images? How can we exploit our collective knowledge and experience in the creation and use of digital collections of text and still images to develop an informed strategy for video and film? Where are opportunities for innovation and collaboration? What are the risks and obstacles? Do existing assumptions about preservation and access need to be challenged?
The first panel will address high level questions concerning the landscape of moving image collections as they relate to digital libraries. Rick Prelinger and Hannah Frost will set the stage by raising probing questions. Peter Kaufman's presentation will reflect on the opportunities for collaboration between media producers and educators, librarians, and archivists in order to strengthen the role of media in teaching and learning. Barbara Taranto will explore the challenges and rewards of creating a place for moving image materials within the total scope of the digital library. Les Waffen will address the topic of partnerships between public and private entities to facilitate digital access to moving image collections.
Mass digitized collections currently complement library catalogs; as they grow and evolve, they may one day have the ability to replace existing book discovery services. The speakers will present findings from research conducted in Fall 2007 that demonstrate how Google Book Search, Microsoft's Live Search Books, and the Internet Archive's Open Library compare to highly regarded non-library book discovery interfaces (Amazon.com Books, LibraryThing) and next-generation library catalogs (NCSU, powered by Endeca; and U. Washington, powered by WorldCat Local). Areas of comparison include recall and precision for a variety of metadata-centric tasks (such as finding a known title, determining seminal works in a particular subject domain), and full-text oriented tasks (such as appearance of a quotation in multiple books, detecting frequency and placement of search terms within the book) and other aspects of the book discovery experience (such as the ability to view relevant recommendations of similar items, ability to effectively disambiguate different editions of the same book). In addition to highlighting key findings that show the relative strengths and weaknesses of the platforms, the presenters will review current methodologies of linking into digital books from library catalogs, speculate on both how book discovery will evolve, and identify key barriers that may impact patron's future ability to have truly integrated and comprehensive and compelling book discovery environments.
Earlier this year, NCSU Libraries extended the scope of its next generation catalog efforts by deploying CatalogWS, a lightweight Web API to the NCSU Libraries' catalog data. This locally developed infrastructure has enabled us to deploy several standalone catalog interfaces optimized for different use contexts. CatalogWS functions as a data source for MobiLIB, NCSU's catalog interface optimized for mobile devices. CatalogWS is also the basis for new staff authoring tools such as a book cover visualization tool for large screen displays, and an advanced faceted search interface for generating custom catalog item lists for blogs and webpages. In this presentation we will describe our motivation for this work, provide a brief technical overview of the API, and provide capsule overviews of select CatalogWS applications developed to date.
Fifty percent of Americans in general and more than 90% of students report playing computer and video games (ESA, 2003). These games are widely believed to show promises as educational tools (Howell and Cannon-Bowers, 2003). In 2004, research on games as educational tools was begun on behalf of e-Education under the auspices of the Learning Technologies Project. The area of computer and video games needs to be divided into subcategories for further research. Wide differences in types of games make it very difficult to make accurate general statement about games. The area of massively multiplayer online games (MMO) has been receiving a increasing attention for the potential power and influence that synthetic worlds may hold (Castranova, 2005). In the past two years, the social immersive synthetic environment Second Life has attracted millions of used and widespread attention from the media (Time, 2006). NASA is charged in its charter to enhance science and engineering education and to share the discoveries with the American people. In 2004, the Aldridge Report challenged NASA to find a way to tap into the power and popularity of computer and video game technology to accomplish that mission. This paper explores the steps the space agency has taken to accomplish that those goals.
Digital library developments, Web 2.0 technologies, and interactive environments present opportunities to amplify library services for undergraduate students, connecting to their interests and passions, and engaging their natural ways of communicating. This paper presents a case study of the strategies in play at the Undergraduate Library at the University of Illinois at Urbana-Champaign to re-invent its services and better connect with undergraduate students. Areas of activity include social networking, gaming, instant messaging, digital collections, a campus partners program, and re-configured physical spaces. Reflections on successes and experiments that offered lessons to learn will be highlighted as well as possibly additional intersections between digital libraries and undergraduate student services.
In part 2 of this panel, a selection of current, on-the-ground activities will be highlighted. In conjunction with the preceding presentations, the series of talks promises to deliver a focused snapshot of the current state of moving image collections in digital libraries, with the intention of augmenting collaboration and strategic planning within the DLF community. Carl Fleischhauer will reveal the latest thinking on content wrappers, bitstream encodings, metadata, and format profiles as they pertain to the Library of Congress's ongoing acquisition of born-digital content, emerging reformatting practices at the Library's new facility for audiovisual collections, and a handful of notable NDIIPP projects. Kara Van Malssen will report on the two key areas of New York University's work for the NDIIPP "Preserving Public Television" project -- designing a digital repository for variety of audiovisual materials and the development of a metadata schema suitable for this unique content - as well as the workflows that bring data together from various disparate sources to create collections. Dave Rice will describe technologies and methods he's implemented to enable practical, efficient, low-cost and sustainable approaches to process and expedite access to audiovisual content. Helen Tibbo will describe one investigation of the VidArch research project: to preserve 2008 US Presidential Election Videos employing a system that facilitates harvesting videos from YouTube, along with corresponding metadata and contextual information, with the goal of informing digital video curation policies.
The Zotero project, based at the Center for History and New Media at George Mason University, is rapidly building and disseminating free and open source bibliographic and research tools, focusing initially on a Firefox extension that over the past year has gone from an alpha release to use by over a quarter-million people in fourteen languages. In this presentation, the co-director of the project describes some of Zotero's latest features, which connect it to a wide range of software, services, and digital collections, and outlines a major expansion of the project for the coming year.
Pennvibes is a framework for content delivery and organization inspired by Netvibes, iGoogle, and Pageflakes. It is being developed at the Penn Libraries using AJAX, XML and Java technologies with the goal of creating a web presence that is drastically more responsive and flexible to the needs of our patrons. We also hope that Pennvibes provides an extensible delivery platform for arbitrary digital library content. When we go live (end of 2007), Pennvibes will enable our Librarians to build new reference pages in a few minutes, complete with custom-tailored (and proxied) lists of resources built from PennTags, integrated search tools (e.g., a Pubmed widget), RSS feeds, editable Webnotes, rotating image widgets, and a "My Library Account" widget that integrates items checked out, fines, and document delivery requests for the patron. In the second phase of the project, we would like not only librarians, but also Penn faculty and students to be able to create and modify Pennvibes pages, thereby making our Library Web site fundamentally more interactive and collaborative. In our presentation, we will demonstrate Pennvibes, outline its potentials for Library Web sites, and discuss the strengths and challenges of the underlying technology. * Please note that "Pennvibes" is an internal name that might be changed when we go live.
Since the beginning of the year, NYU has been designing a new facility for the accession of standard- and high-definition moving images to a file-based preservation repository. This has led NYU's Digital Library Technology Services group (DLTS) into the domains of signal flow, compression formats, and broadcast standards. Traversing these domains with a preservationist perspective has raised questions about information loss and the repurposing of industry-focussed hardware and architectures to meet the requirements of preservation. We will present completed design drawings for the facility, discuss the expected workflow, and explain how each design decision was reached. We will highlight areas in which our facility - which uses industry hardware and software for playback, monitoring, capture and transcode - differs from the 'solutions' in which these components typically appear. In particular, we will contrast our decision to archive uncompressed video files with the industry's use of "visually lossless" archival assets. The design of the accession facility has given the DLTS group an opportunity to learn about the business and technology of video production and distribution. Both are undergoing disruptive change owing to the emergence of HD, tapeless production workflows, web distribution, and commercial digital video archives. Our presentation will touch on these issues and their implications for the accession of moving image content. Finally, we will discuss plans for further work, such as experimentation with mathematically reversible compression algorithms and analysis of the risks posed to moving image preservation by lossy compression and the industry's new generation of proprietary technologies.
Web sites such as YouTube have made video more pervasive on the internet than ever before. And the proliferation of consumer products for the creation and manipulation of digital video has made nearly anyone able to create a video which can be posted to YouTube. In the area of academic scholarship, the use of video has been growing as well, certainly in recording musical or dramatic performances, in recording interviews for sociological or psychological data collection, in recording field work in ethnomusicology or anthropology and in other significant scholarly endeavors. The Ethnographic Video for Instruction and Analysis Digital Archive (EVIADA) is one example of a project attempting to bring digital video scholarship to the internet. Using YouTube as a touchstone, the paper will discuss issues faced by any digital video project and how EVIADA chose to address some of them. These issues range from the creation of archival quality digital video to the addition of descriptive and technical metadata to the presentation of the content in a web browser.
The preponderance of descriptive metadata standards, including structure standards, content standards, and controlled vocabularies, makes choosing among them difficult. When is using standards for representing metadata natively at your institution the right choice, and when is branching out on your own the better option? This panel will present case studies in making these decisions, and look to identify trends and criteria that can be re-used by others. Arwen Hutt, University of California, San Diego, will discuss the Archivists' Toolkit and its ability to export the user's choice of EAD, MARC, MODS, and Dublin Core. She will describe the reasons behind the format-neutral approach, and the differences between AT output and native metadata in one of the output formats. Sarah Shreeves, University of Illinois at Urbana-Champaign, will discuss the institutional repository context and the metadata constraints imposed by commonly-used repository software. She will describe challenges inherent in a single metadata model for diverse materials, barriers to authority control, and outline a vision for IRs of the future. Jenn Riley, Indiana University, will describe contrasting cases from her institution, each in which one approach was taken then later changed. She will discuss methods for making good decisions on this issue within a rapidly-changing metadata environment.
The Media Grid is a public utility that provides digital media delivery, storage and processing (compute) services for a new generation of networked applications. Built using Internet and Web standards, the Media Grid is an open and extensible software development and delivery platform that is designed to enable a wide range of applications not possible with the traditional Internet and World Wide Web. Applications enabled by the Media Grid include: Immersive Education; on-demand digital cinema and interactive movies; distributed film and movie rendering; truly immersive multiplayer games and virtual reality; real-time visualization of complex data (weather, medical, engineering, and so forth); telepresence and telemedicine (remote surgery, medical imaging, drug design, etc.); vehicle and aircraft design and simulation; and similar high-performance media applications. This paper will provide a basic introduction to grids and grid computing, and explains how open, international standards enable the Media Grid to be a global "grid of grids" that extends to digital libraries.
The Data Documentation Initiative (DDI) is an international effort to establish a metadata standard for describing social science data. This presentation will begin by providing the audience the background and history of the DDI. This will be followed by a brief introduction to the current DDI (version 2.1) and its community of users. Finally, I will report on the latest update to the DDI (version 3.0) that is currently under public review, as well as the conceptual framework upon which it was developed, the data life cycle.
This presentation will demonstrate the progress made improving discovery and use of digital objects through DLF Aquifer development. Attendees will see the newly designed DLF Aquifer portal, optimized to expose digital content to commercial search services. DLF Aquifer collections will also be shown through an integration with Zotero. Presenters will explain the underlying technical infrastructure and the rationale for choosing specific tools, technologies and techniques. The portal is built using Ruby on Rails with Solr for indexing and XSL is used to transform MODS to Solr. Developers will demonstrate how asset actions are now incorporated into the portal as are thumbnail images. Members of the DLF Aquifer team will highlight contributions the DLF Aquifer initiative is making to the digital library community through standards and best practices development. The presentation will provide a brief overview of the recently released MODS Guidelines Levels of Adoption, explaining how the levels link to discovery and delivery functionality. Based on expressed need within the DLF Aquifer initiative and throughout the digital library community, the DLF Aquifer team is adapting development methods used in industry to fit the digital library environment. Attendees will learn about the methods used and hear how the approach can be applied to both distributed, collaborative virtual teams and co-located teams in individual institutions.
Blacklight is the University of Virginia Library's project to develop a "Next Generation" search and browse interface for our OPAC and our digital collections together, encompassing multiple metadata formats. Users start their experience with the available global facets already shown. Users can explore with a search and by filtering with the terms in the facets, mixing the use of search and faceted browse to make it easier to perform complex discovery operations. In our current work we are focusing on how Blacklight can enhance the discovery of our music collections and our digital collections, as we progress from the prototype first shown at the spring 2007 code4lib pre-conference to a production implementation. We have indexed ~3.7 million MARC records and 35,000 XML objects from our Fedora-based Digital Collections Repository. This presentation will discuss the technology and the process for the implementation, and demonstrate the current application. Blacklight was developed using Erik Hatcher's Solr Flare with Lucene and Ruby on Rails, implemented by Library staff and the company OpenSource Connections.
Ever wondered how to keep track of the file formats that exist in your digital repository, and whether those formats are still viable? If you're a repository manager who has no control over the file formats that make their way into your repository you'll be particularly familiar with this issue. What is the risk of obsolescence to your precious content? Previously, the only way to answer this question was to manually browse file format information in a file format registry, such as The National Archives' PRONOM or Library of Congress' Sustainability of Digital Formats website. The Global Digital Format Registry (GDFR) will be the next generation of these, operated in a networked, collaborative fashion. However, none of these registries enable you to monitor the information about file format obsolescence in the context of the contents of your own collection repository. The Automatic Obsolescence Notification System (AONS) has been produced by the Australian Partnership for Sustainable Repositories (APSR) project in partnership with the National Library of Australia (NLA) during 2007. This software allows users to automatically monitor the status of file formats in their repositories, make risk assessments based on a core set of obsolescence risk questions, and receive notifications when file format risks change or other related events occur. It has been released as open-source software on SourceForge and could prove to be highly useful for the digital library community. This presentation will focus on why a tool like AONS is needed and some of the issues encountered during the project, as well as providing a demonstration of the software.
An open discussion session at last spring's DLF forum made it clear that libraries were requiring and attempting to develop a wide variety of innovative applications supporting the discovery and use of library resources. These applications, taken as a whole, go well beyond the user interfaces offered by existing integrated library systems (ILSs), but may depend on the data and services that they manage. In response, the DLF convened an ILS Discovery Interface Task Force to make technical recommendations for machine-accessible interfaces that could be used with ILSs to support new discovery applications. This task force is preparing a draft recommendation to be released prior to the Fall Forum. In this presentation, we give an overview of the recommendation, its rationale, its main features, and how we hope it can be developed and applied. We invite all interested parties to attend this overview and join in the open discussion session to follow, to help us produce the best final version of our recommendations. More information on the task force and its work can be found at https://project.library.upenn.edu/confluence/display/ilsapi/Home
This is a session for in-person feedback and discussion on the ILS Discovery Interface Task Force draft recommendations for interfaces to the ILS for discovery applications. (The draft recommendations will be released before the Fall Forum, and are also the subject of an earlier session in this Forum.) Members of the task force will be present to hear your suggestions and answer your questions. We invite librarians, developers, ILS mavens, and other interested parties to join in the discussion. More information on the task force and its work can be found at https://project.library.upenn.edu/confluence/display/ilsapi/Home
The open source JHOVE format identification, validation, and characterization tool has proven to be a successful component of many repository and preservation work flows. Through the course of its widespread use, however, a number of limitations imposed by its current design and implementation have been identified. To remedy this, Harvard University, Portico, and Stanford University are collaborating on an NDIIPP-funded project to develop a next-generation JHOVE2 architecture. Among the enhancements of JHOVE2 are streamlined APIs; a more sophisticated data model supporting arbitrarily-nested, independently-formatted bit streams; a generic plug-in mechanism to permit the construction of arbitrary stateful work flows, and automated policy-based assessment based on prior characterization and locally-define heuristics and rules. (The full project proposal is available at http://hul.harvard.edu/jhove/JHOVE2-proposal.doc.)
The project partners are actively soliciting the advice of the JHOVE user community to ensure that JHOVE2 will better meet the needs of digital repository and preservation practitioners. This Discussion Session will be used to clarify functional requirements and technical specifications. Participants are encouraged to provide use cases relevant to their local needs or the perceived needs of the wider community. The session will also include discussion of ways in which JHOVE2 can be structured to facilitate easy integration into existing or planned systems and workflows, and to encourage third-party modification and extension.
The METS Editorial Board and the PREMIS Editorial Committee are actively seeking the input of METS and PREMIS implementers on practices for encoding preservation related metadata in METS documents for digital resources. This advice is being sought to ensure that METS and PREMIS better meet the needs of digital repository and preservation practitioners both for achieving interoperability and for the most effective use of the METS and PREMIS schemas within the same context. This discussion session will be used to clarify current practices, and discuss what might be "best" practices for implementing PREMIS within METS. In particular, issues of redundancy and extensibility need to be resolved so that common practices may be adopted, facilitating document sharing and tool development. Participants are encouraged to bring use cases relevant to their local needs or the perceived needs of the wider community. The session will also include discussion of ways in which METS and PREMIS might be adapted to more easily integrate into existing or planned implementations. Outcomes: Face to face discussion of issues arising out of the draft Best Practices document prepared by the PREMIS Editorial Committee with the hope of achieving agreement on the approaches that should be taken. A better understanding of needs associated with suggestions for changes to be made to the METS and the PREMIS schemas.
The basic components of the Mellon funded American Social History Online architecture for DLF Aquifer are in place. The next step is to develop a scaleable workflow to add quality content the Aquifer collections working group targeted for inclusion. This session is designed to provide practical advice for potential collection contributors as well stimulate community discussion about "deep sharing".
The session will begin with an open discussion of the DLF Aquifer value proposition. Why does it make sense to contribute collections to Aquifer? DLF Aquifer core team and working group members will be on hand to review standards and communities of practice DLF Aquifer is using. Forum attendees who would like assistance thinking through or setting up a workflow to prepare collections for Aquifer are encouraged to participate. A portion of the time will be dedicated to a facilitated discussion about sharing collections, including a proposed workable definition of what we mean by "publicly accessible". Participants will be encouraged to share their views about the possibility of making digital objects or surrogates from individual collections available for capture and re-use in other systems. What kinds of agreements would need to be in place? Would it be realistic to map legacy rights statements for digital collections to Creative Commons licenses?
The term "publishing" is becoming as difficult to use meaningfully as "digital library." We all recognize that "publishing" is crucial to a research university, but as we are encouraged to imagine new forms of publishing, new services to support it, and new roles for publishers and librarians, do we know if we are all talking about the same thing? The discussion leaders lead a collaborative publishing program, which has so far been defined around platform and product development. While we believe that their experience in working together may be of interest, we primarily hope to engage the audience in a broader conversation that leads to clarifying the prospects for working across institutional boundaries in order to defining publishing activities and the services that support them in all their various forms. Possible discussion questions: What are we talking about when we use the term "publishing?" What do publishing services mean on our campuses now, who is offering them, and what are they? What do librarians, technologists, and publishers need to know about each other's domains to successfully collaborate? What are the threats to each that emerge from collaboration? What types of services and/or products are promising and useful for collaborative activities? When and where is scale in activities important and less important? How are technology platforms constraining our ability to imagine, provide, or define publishing services? What can we do to move past these constraints? What problems should we try to solve? What organizational/content/technological models are needed to solve them?
DLF has built an island in Second Life to explore the game's potential use for extending library services? Now what?
An international coalition of image creators, image distributors, image users and cultural heritage institutions has developed the Picture Licensing Universal System ("PLUS"), an integrated system of standards for use in expressing rights and attribution metadata for still images. The PLUS Coalition, a non-profit , apolitical and industry-neutral standards body, invites the participation of all digital libraries in the continued development of version 2.0 of the PLUS standards. By collaborating with photographers, illustrators, publishers, designers, stock image agencies, advertising agencies, researchers, museums and traditional libraries, digital libraries will ensure that the PLUS standards will simplify and automate library ingestion and management of rights information. Join Professor Jeff Sedlik, President & CEO of the PLUS Coalition, as he describes the interests and efforts of PLUS. See www.useplus.org. Copyright status of US books published between 1923 and 1963 is of particular interest, as they required renewal registration, and listings of renewals have historically been challenging to search. Stanford University has compiled those records in the Copyright Renewals Database (http://collections.stanford.edu/copyrightrenewals), and is now examining the potential to automate the analysis of the copyright status of a work or group of works, with the understanding that the CRD database would be integral to such a system. Outcomes: * Development of formal and informal paths of communication among picture industry professionals and DLF members engaged in digitizing and making accessible still images under reasonable and accessible licensing terms. * Greater awareness of and familiarity with the Copyright Renewals Database, a review of potential pitfalls with its use, and discussion of potential development of a copyright analysis system.
NYU Libraries has partnered with Dr. Robert McChesney, Professor Emeritus of Middle Eastern and Islamic Studies, and Project Advisor, Afghanistan Digital Library to create the Afghanistan Digital Library. This library, a virtual and comprehensive collection of all material published in Afghanistan between 1871 and 1930, is meant to reveal the history of Afghanistan via primary source material. This history has been obscured by a combination of political instability, intentional destruction, diaspora, and curatorial practices within the country itself. Recently, in an effort supported by NYU Libraries, the National Endowment for the Humanities and the Ministry of Culture and Information of Afghanistan, the project was extended to include materials located within Kabul, Afghanistan. In Spring, 2007, NYU sent a three-person team to the National Archives in Kabul to set up a conservation laboratory and a digitization workstation and to train National Archives staff in conservation and digitization skills. The NYU team consisted of Melitte Buchman, Digitization Specialist, and Peter Magierski, cataloger, both of NYU and John Dean, Preservation and Conservation Librarian at Cornell University. In the first half of the presentation, the state of visited libraries and archives, cataloging and authority control, and the protection of material by the Afghans will be discussed. In the second half we will discuss training materials developed for teaching digitization to the staff at the National Archives, what worked what did not; unexpected technology issues; and finally some thoughts on why the staff at the National Archives are excellent and precise imagers.
For the past year, Stanford University Libraries has been engaged in defining the components and overall design of an emerging cyberinfrastructure to support its evolving portfolio of digital content, applications and services. This effort has been driven by the realization that our current library information systems are inadequate to address the demands posed by numerous, large scale digital initiatives. Mass digitization projects (both internal and with Google), production ingest into the Stanford Digital Repository, the addition of born-digital materials to our collections, and the rolling development of multifaceted discovery and delivery systems have all illustrated the need to develop a new information infrastructure for the library of the future. Informed in part by the recent Mellon-funded investigation into ESBs (Enterprise Service Buses), Stanford has now completed an initial definition and design of the core components of this cyberinfrastructure. The resulting architecture comprises three tiers: underlying core infrastructure (e.g., workflow engine), supporting a suite of reusable services (e.g., persistent URL service), which can be assembled into higher-order library applications (e.g., a text mining portal). This conceptual design addresses needs illustrated in nine different use narratives, each of which is grounded in unmet functional requirements for our emerging cybrary. This paper frames Stanford's approach to defining a digital library's requirements for a cyberinfrastructure, and posits an architecture to manage digital assets and services where the ILS and digital repositories are both present but where neither is central to the solution.
In 2006, NSF sponsored a workshop on the challenges of scientific workflows, citing these as a key and underrepresented ingredient in cyberinfrastructure. Recommendations included the assertion that data created through workflows should include representations of those workflows as metadata. Libraries interested in developing data management capabilities should investigate ways to assist not only with the metadata creation necessary to describe the final data products associated with research results, but also to assist with modeling research workflows and documenting the relationships between research data, tools and processes. This paper outlines the exploration & design phase of a pilot project to include the raw and reduced data from the PRIsm MUlti-object Survey (PRIMUS) cosmology research project at New York University as a digital collection. Key elements of the proposed work are to examine the research workflow, construct a generalized domain model, and to generate specifications for the accession of Open Archival information System (OAIS) Information Packages to NYU's digital repository. The domain model will 1) manage variability within the workflow (e.g., software versioning, heterogenous coordinate systems) 2) allow service provider interoperability among large data sets and 3) allow users to access data objects according to their logical relationships. Project organizers provide an overview of the PRIMUS research workflow, the projected collection size and scope; and summarize data structure, components and associated peripheral objects. We then describe an exploration & design phase methodology that will facilitate data preservation activities that operate in concert with research activities, tracking data throughout its lifecycle.
As early as 1997, the National Library of Australia (NLA) recognised the need to improve its processes for managing rights information for its collections. Information about rights, rights holders and permissions is still stored via a variety of mechanisms - paper folders, card files, paper or electronic official records, catalogue record notes. In some cases, rights information is only available in the memories of certain key staff. Finding rights information about a given collection item is somewhat like finding the proverbial needle in a haystack, especially for areas of the organisation wishing to access the information held by other work areas. Now the NLA has a strategy for the future of rights information - a new system for staff to record and access rights information in regard to its collections. Version 1.0 of the system, in use since September 2007, includes functionality for recording access conditions and permissions in association with particular digitised collection items. Work on the next version is currently underway, and aims to broaden the functionality to the recording of access conditions and policies across collection items "owned" by a given rights holder. The third version of the software will expand its existing functionality to bring physical collection items into scope. The software has been built in a modular architecture that could potentially be reused by other institutions will a small amount of modification. This presentation will focus on some of the issues encountered in the project, and demonstrate the existing functionality of the Rights Management System as it exists in the context of the NLA's digital collections.
Sixteen METS Profiles have been registered in the 4 years that the METS Board has been registering profiles. After a brief introduction to METS Profiles, this paper examines the registered profiles, comparing and contrasting approaches that profiles authors have taken. It also discusses common complaints about profiles, and speculates about future development in the area of METS Profiles.
The World Wide Web has unleashed the unprecedented sharing and dissemination of information and the recent phenomenon of "Web 2.0" takes things one step further by turning the Web into a vast distributed platform for collaboration and participation. Web applications have evolved to enable the formation of large interactive communities that share both ideas and objects. Scientists and scholars are increasingly looking to social networking and information sharing technologies as a means of making revolutionary changes in the process, pace, and quality of intellectual discourse and scientific results. Similarly, educators and museum curators are realizing that the very technologies that students are using in their leisure time just might be an enabling mechanism for improving the both the process and quality of education in the United States. But, what happens if a new breed of web-based, collaborative applications result in information becoming locked up in a new wave of clever, yet idiosyncratic, systems that are not built to facilitate sharing across boundaries, and are not attentive to the long-term sustainability of digital information? Fedora Commons is a new non-profit organization established to provide a robust, open-source technology platform to enable revolutionary change in the ways scientists, scholars, and educators produce and share the intellectual outputs of their work - while simultaneously ensuring the durability and longevity of the record of knowledge resulting from this work. This paper will discuss how Fedora Commons will evolve the Fedora repository framework and build a new model of community participation to meet the challenges emerging in several key areas: (1) open-access publishing, (2) e-science, (3) new models for scholarly communication, and (4) collaborative, semantic digital libraries. The Fedora Commons mission is initially supported by a $4.9 million 4-year grant from the Gordon and Betty Moore Foundation.
Connotea allows researchers to create a bibliography and set of bookmarks to online resources. They can share or hide the items in their library as well as tag and annotate them.
It stands on it's own as a valuable resource for individuals and research groups due to it's ability to extract citation information from web pages, but where it has the possibility to become a truly useful part of the researchers toolkit is through the social aspects of the site.
In this talk I will present Connotea and discuss it's uptake, usage and the future plans that we have for it. I will mention it's uptake as a platform for other people to build upon, and I will also focus on the interest shown by librarians and what we can do with Connotea to help them integrate it with library systems.
Panelists will review outcomes and promote discussion about the Greater Philadelphia Geohistory Network (GPGN, http://www.philageohistory.org/geohistory/index.cfm ), focusing on public/private partnerships; integrated presentation of disparate data; collaborative technical architectures; and future applications on mobile devices. In 2005, the Philadelphia Area Consortium of Special Collections Libraries (PACSCL) received a planning grant from the Andrew W. Mellon Foundation to create the GPGN. Building on work done by the Cartographic Modeling Lab at the University of Pennsylvania, the Athenaeum of Philadelphia, and the City of Philadelphia Department of Records, GPGN is a geographically-based Web resource on the history, culture, and architecture of Philadelphia using historical maps, the City's robust GIS system of parcel maps, and the extraordinary concentration of primary image, manuscript, archival, textual materials and data held by PACSCL members.
Panelists will discuss the administrative and technical models that are being developed for the Network. The panelists have been responsible for developing the GIS platform and related products, such as PhillyHistory.org and the Pocket Culture Browser; the Philadelphia Architects and Buildings database; and the Neighborhood Information System. They participated in a PACSCL symposium (December 2005) where a range of interest groups--scholarly, commercial, historical, legal, governmental, and civic-- discussed the components, features, and technologies for the network. PACSCL is considering next steps and would benefit from DLF members advice and feedback about their future options, especially technical possibilities, for using GPGN both as an interface to a network of library and data collections and as a site for creating and archiving new materials.
Discussions of cyberinfrastructure needed for curating scientific datasets often tend towards those produced by 'big' e-science - that is, the instrument driven, terabyte and petabyte sized datasets. But what about 'small' science - the datasets produced in small labs that are not necessarily instrument driven, but are collected and pulled together by humans? What are the curation needs for this category of data? As we have gone to faculty to talk about IDEALS, the UIUC institutional repository, long term preservation and access to datasets has consistently come up particularly for small datasets that do not have a national or disciplinary home. This paper will present raw perceptions from discussions with a chemist, anthropologist, botanist, an engineer, and others; this is not meant to be a scientific survey, but a report from the field that can perhaps complement and inform discussions around e-science and cyberinfrastructure.
This presentation describes Papryi.info (http://papyri.info),U an innovative scholarly portal developed by Columbia University with funding from the Andrew W. Mellon foundation. Papyri.info brings together content from several distinct and separately managed databases in order to provide scholars worldwide with an integrated view of relevant metadata, images, texts, translations and critical apparatus needed for the study of ancient papryi and ostraka (clay tablets). Papyri.info was put into production in July 2007 as an operational prototype and currently incorporates content from APIS (Advanced Papyrological Information System, hosted at Columbia), DDBDP (Duke Databank of Documentary Papyri) and HGV (Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens). Other related scholarly resources may be integrated in the future. Areas covered in the presentation will include: the use of Apache's Jetspeed-2 portlet technology as the preferred development framework for the interface to facilitate the creation of separate "channels" of information for component data sources; the selection of software tools and metadata protocols; the use of the commercial "eRez" tiff image server and the Flash-based "FSI" image viewer as a powerful and flexible image delivery system. The presentation will also provide a preview of the 2007-2008 development agenda now being carried out by Columbia University, Duke University and others, which includes: implementation of a Web Services-based papyrological text identification server; the migration of the legacy Beta code (Greek), SGML-based Duke Databank to a new UNICODE, TEI/EPIDOC-encoded text base; and the creation of a new full-text search engine for ancient Greek supporting both string and "lemmatized" searching.
This talk describes how two granted projects at Yale University work with a virtual community of contributors to create a digital resource called AMEEL (Arabic and Middle Eastern Electronic Library). AMEEL engages individuals and institutions in virtual space to construct a sustainable community comprising librarians, scholars, developers, and vendors engaged in providing access to scholarly digital content about the Middle East. AMEEL will employ virtual alliances on two fronts: during development and for sustainability. For example, we are digitizing scholarly journals from the Middle East locally to form the core content of AMEEL. Because Arabic texts present special challenges, we are forging relationships to distribute digitization workflow tasks remotely while we formulate "best practices" to share with academic libraries. With a group of pioneer Middle Eastern libraries, we hope to make Document Delivery requests a new regional tradition while importing digital copies of these requests into AMEEL. Concurrently, we will connect to existing digital content from U.S. and European academic libraries and commercial publishers such as JSTOR and Brill to make integrated content -- searchable in Arabic and English -- available to a global community. To accomplish this, we are collaborating in virtual mode with technical partners in Europe and the Middle East as we develop a scalable infrastructure, using the FEDORA framework, to permit indexing and searching of Arabic text. The challenges inherent when working in a virtual and distributed way are numerous. This talk will describe several, especially those related to digitization, repository development and long-term sustainability.
return to top >>
Copyright © 2007 Digital
Library Federation. All rights reserved.