The Distributed Library:
OAI for Digital Library Aggregation
IMLS 2004 National Leadership Grant for Libraries
Research and Demonstration
National Impact
The Open Archives Initiative
Protocol for Metadata Harvesting (OAI-PMH) has proven itself as a
straightforward and functional mechanism for sharing metadata across systems
for the purpose of building focused services. We have collectively proven the
protocol and its utility; however, as Martha Brogan notes, "there are numerous
practical, technical and philosophical impediments to the full realization of
OAI-based services . . ." The multi-stranded research proposed in this grant will
address these various impediments, and move us closer to the goal of federated
collections across institutions, and the ability to create richer services for
our library users.
This issue of a distributed
library made up of multiple collections in a much more malleable, easily
accessed service landscape is of sufficient importance to the major research
libraries who make up the DLF that--after a strategic planning session and
eight months of discussion and planning (February through October, 2003)--we voted unanimously
to make the creation and sustenance of a distributed, open, digital library the
organization’s overarching strategic goal, taking us full circle to the first
item on our founding 1995 charter. This endeavor, we believe, has far-reaching
impacts inside and outside the DLF institutions, and the work done during this
grant will be of fundamental importance to our ambitions for a comprehensive
finding system.
Our team of
partner organizations draws on very active OAI practitioners: the University of
Michigan (Michigan) will host an experimental harvesting service and metadata search
portal for the purposes of the grant, with the harvesting service functioning
as a test-bed for our research and as an encouragement for broad participation
within the DLF; University of Illinois at Urbana-Champaign (UIUC) will develop a registry to allow closer relationships between
harvesters and providers; Emory University (Emory) will oversee the training and consultancy
portion of our work, and we expect to see their established and thriving American
South collaboratory as a major subject-focused proving ground during the
IMLS grant period.
Workflow and Training
One impact will be to learn
how to ingrain the creation of item-level metadata into the digitizing
workflows of the DLF--we are keen to know how to do this and will look to the
workflow evaluation and subsequent training provided here by Emory to do this. The
barriers to the routine provision of sharable metadata are--we believe--not
substantial, but they are real and not clearly understood. By canvassing the
DLF member institutions we expect to learn a lot more about what changes in
local day-to-day practices and our collective sense of mission need to occur
for us to share digital object metadata readily, and start to rely on its
existence in our local service provision. Our ambitions for richer deep
sharing and more tailored local library services have at their foundation the
need for a trustworthy and well-populated finding system for our distributed
content.
Enriching the OAI Record
The ability to examine the
form and use of collection-level OAI records, and to build on work currently
being funded at UIUC, is another important area, as is the question of when
we need a metadata base that is richer than the unqualified Dublin Core
mandated by OAI. We feel confident that digital library providers of records
for digital library aggregations and services will be willing to follow rather
more nuanced and prescriptive Best Practices for the creation of the digital
library OAI record; indeed, we are typically avaricious for such guidance and
advice, as the success of past best practice guides from the DLF, RLG, IMLS, and
elsewhere attests. This IMLS grant will cull the lessons learned from the
first wave of harvesting services and reflect back our collective observations
with regard to the formation of the catalog records. Early harvesters have
spent inordinate amounts of time normalizing and completing the records in
order to build the services they wish to provide, and we must improve that
situation if a distributed finding system is going to scale and thrive.
Coordinating Services and
Providers
In the conclusion of her publication Survey
of Digital Library Aggregation Services, Brogan highlights the lack
of any comprehensive registry of OAI service and metadata providers, "it
is difficult (at best) for users to know the extent of services
available." Recent work at UIUC and within the
European Union, along with tools like Virginia Tech's
Repository Explorer,
are helpful in this regard, but clearly more research is needed to enable OAI
service providers to better discover and learn about OAI metadata providers
(and vice-versa).
As part of the
proposed research and using as a starting point the experimental registry of
OAI metadata providers already developed by UIUC(http://oai.grainger.uiuc.edu/registry),
we propose to investigate and experiment with potential methods to enable
better discovery of OAI service and metadata providers. The outcome will have
positive national and international impact on how OAI service and metadata
providers discover each other and learn about available metadata and services.
Adaptability
OAI-PMH is based fundamentally
on omnipresent standards and protocols such as HTTP, XML, and the Dublin Core
metadata schema. It has proven itself as a functional and stable protocol,
and is increasingly widely used. What we provide by way of a prototype
registry, or services crafted with deep input from users, or guidance on the
creation of records more amenable to use in digital library services, will be
usable far beyond the confines of the DLF institutions who comprise the test-bed
collections and user communities. The ultimate impact, we believe, will be that
many other libraries will follow the example of the DLF and its recommendations
for OAI metadata tailored for digital library services; there is a strong history
of the DLF recommendations and best practices having a wider influence than the DLF
membership. As with other project components, software and schemas developed
as part of the upgrading of the UIUC registry of OAI metadata providers will be
made available under OpenSource license. This work will exploit collection
level description being done by the UIUC IMLS Digital Collections and Content
project, the UK's RSLP (and related) projects, and the Dublin Core Collection
Description Working Group. Schemas developed as part of the OAI registry
component enhancement work will be especially adaptable for use in sharing
information about collections generally and OAI metadata providers in
particular.
Design
Context. The DLF has been actively engaged with the OAI community since its
beginning, providing some of the funding for the early work (with CNI) and
through its members having an early commitment to providing OAI records and
harvesting services. The California Digital Library, Carnegie Mellon
University, Cornell University, Emory, Indiana University, Library of Congress, MIT, North Carolina
State University, and universities of California, Berkeley, Chicago, Illinois, Michigan,
Pennsylvania, Tennessee, Virginia, Michigan, and Washington all show up as providers of OAI records currently, and as the letters of
support attached to this proposal demonstrate, other DLF institutions are
committed to much deeper engagement with sharable metadata and the
next-generation services they allow.
Audience. The material to hand most easily for this research and
demonstration project is that which has already been digitized in DLF
libraries. Typically, to date, this biases heavily towards text and images,
and predominantly in the humanities and social sciences (so much so, in fact,
that in our discussions of the DLF distributed library we have raised again the
notion of using it as the seed for a national library for the humanities). Brockman,
et al, in the abstract to
Scholarly Work in the Humanities and the Evolving Information Environment, have reminded us "in particular, the
findings emphasize how important it is for libraries to chart their
evolutionary course in close consultation with scholarly user communities." Working
with our scholarly advisory group (see below) we plan to extract out of the
larger collection of records some area or areas of subject focus--The American
South and/or American literature would be likely candidates. This builds on
existing communities of interest and our own sense of what is in our collections
and--if at all possible--would allow us to extend our examination of how
scholars and teachers use a metadata-based finding system into some ongoing
research and classroom projects. The
DLF Registry of digital collections, currently being
updated with material up to December 2004, will be of considerable aid in this
work.
Assessment of Needs. From their own work and that of close colleagues, the
participants from Emory, Michigan, and UIUC have a firm grasp of what
relevant work has been done to evaluate OAI-based services, and what is going
on to improve them.
In addition, the DLF commissioned Brogan to undertake a
Survey of Digital Library Aggregation Services, which provided an overview of a diverse set of more
than thirty digital library aggregation services, organizing them into functional
clusters, and then evaluating them from the perspective of an informed user. Most of the services under review
rely wholly or partially on OAI-PMH, and the
conclusions of this study are ones that will guide part of our research. Her
findings urge cautious optimism but show plenty of room for major improvement
in precisely the areas we concentrate on here.
Most important is the current IMLS-funded
work at UIUC, Digital Collections and Content (DCC), a three-year
effort to build a national infrastructure for
adaptable, interoperable, and sustainable digital collections, which uses
OAI-PMH to harvest metadata from current and past IMLS National Leadership Grants
(NLG) awardees with digital collections. The current DLF submittal to the
IMLS NLG program proposes explicitly to build on and extend important facets of
the current UIUC project. Specifically, this new and
deliberately complementary proposal focuses on a more homogeneous universe of
digital collections and institutions, which will allow the DLF researchers to
go further in developing more advanced standards, recommended best practices,
and training approaches expressly tailored for academic research libraries.
Development of these standards will build on research undertaken on our broader
based project at Illinois into metadata schema implementations and authoring
practices within the community of IMLS grantees, and will inform and be
informed by the ongoing involvement of many DLF members in the evolution of
metadata standards like Dublin Core, MODS, EAD, and METS. This work promises to
further advance and prove in yet another domain the usefulness of OAI as a
technology for sharing descriptive metadata.
The work proposed by the DLF also will extend and make more valuable the work being
done at UIUC on collection-level description. The DLF proposal promises to
take the collection-level description best practices developed at Illinois in
support of the collection registry component of its project and identify a
range of standard ways in which such rich collection-level description can be
included in OAI metadata provider services (e.g., through use of more complex,
multi-faceted metadata schemes such as METS). This is the logical next step
towards more closely coupling technologies used for the dissemination and
management of item-level metadata and collection-level descriptive metadata,
and is a necessary prerequisite towards merging human-generated metadata about
collections with descriptive metadata derived by automated text mining. More
seamless joining of collection-level description, item-level description, and
automated full-text retrieval technologies is essential in the long run to
enable capable end-user portals to our digital libraries.
Implementation
Based on this range of
experience, knowledge, and expertise, we will undertake this research and
demonstration by performing the following tasks, expressed here in four 6-month blocks.
Phase One, months 1--6:
- Set up technical components
for OAI-PMH harvesting [Michigan team]
- Harvest all current
available data from DLF institutions [Michigan team]
- Convene
the DLF OAI Best
Practices Working Group to draft documents of best practices for OAI records
and collection descriptions, drawing heavily on the collective experience
of DLF members who have already created OAI harvesting services and are
actively researching the next steps, including Emory, Michigan, OCLC,
and UIUC
- Establish the research
project’s Scholarly Advisory Group, and provide them with the means
and encouragement to inform and challenge our decisions and assumptions
throughout the project. [Note: prior to the beginning of the grant, the
DLF convened a meeting of scholars who are deeply involved in
digital projects that draw heavily on digital library resources; we expect
that the Scholarly Advisory Group will grow out of that focus group we
will establish in the spring of 2004, see
http://www.diglib.org/use/scholars0406/.]
- Work on export of registry
records [UIUC Team]
- Experiment with addition
of classification attributes to registry records [UIUC Team]
Phase Two, months 7--12:
- Test the harvesting
mechanisms and resolve any remaining technical problems [Michigan team]
- Plan the training needed
to ease the transition of DLF institutions to the provision of OAI
metadata records for harvesting [Emory team]
- Provide draft versions of
best practices documents for the creation of new OAI records that are
explicitly tailored for digital library services [DLF OAI Best Practices
Working Group]
- Convene a DLF Finding
System Research team, drawing from across DLF institutions; their
first task will be to identify tools used for creation of OAI records at
local institutions, in order to inform the training and guidance provided
to new providers
- Sample metadata records to
look for ways to enrich registry records [UIUC Team]
- implement experimental
search features that use collection-level description metadata [UIUC
Team]
- Expect to be working with
some new providers by this point
- Commission Martha Brogan
to review and revisit the work she did in 2003 for the DLF in surveying
digital library aggregation services, in part to inform the service design
in the second half of the grant
Phase Three, months
13--18:
- Expect to see the bulk of
new OAI providers by this phase of the grant
- Through significant use of
the Scholarly Advisory Group, design a functional OAI-based finding system
that will form the initial prototype for the DLF’s distributed library
- Work with the Scholarly
Advisory Group to uncover focused collections within the mass of the
aggregated records, and identify individual scholars and/or teachers who
will use that material in their work.
- Explore what level of
normalization of the harvested data is needed [DLF Finding System Research
team]
- Experiment with exposure of
the records to Google, and report on the results [Michigan team]
- experiment with the use of
OCLC Web services tools for name authority provision, and report on the
results [Michigan team]
- Research techniques to
automatically characterize metadata records available from given OAI
providers [Illinois Team]
- Expand current registry to
include OAI service providers [Illinois Team]
Phase Four, months
19--24:
- Experimentation with
existing portals such as OAIster, Internet Scout portal, iVia [Michigan]
- Explore the collection
development potentials of the prototype [All]
- Gather feedback from
across the IMLS-funded and DLF-funded teams that have contributed to this
research and prototyping [All]
- Revision, documentation,
and promotion of results [All]
- Liaise with the DLF teams
(and others) that will be moving forwards after this research to implement
a finding system informed by our research [All]
Management
The Principal Investigator,
together with Co-Principal Investigators, shall provide the overall direction
of the work undertaken here. Project Director/Principal Investigator, David
Seaman, will provide coordination and oversight for the whole project. The three Co-Principal
Investigators--one each at partner institutions Emory, Michigan, and UIUC--have
all been involved in successful digital library projects for funding agencies
including NEH, IMLS and the National Science Foundation. The University of
Michigan Library will host an experimental OAI metadata harvesting service and
metadata discovery portal(s) for the aggregation of DLF-member contributed
metadata. This harvesting service and associated portal(s) will serve as a
test bed for many of the research investigations outlined in the proposal
narrative. UIUC’s Grainger Library will
host the collection registry and metadata repository services created as part
of this project. Grainger Library currently hosts both the UIUC OAI
Metadata Harvesting project and the TDC project.
Personnel
Digital Library
Federation/Council on Library and Information Resources
David Seaman (Project Director/Principal Investigator): has previously run grants
from the Andrew Mellon Foundation, NSF (international collaboration grant with
the Deutsche Forschungsgemeinschaft), NEH (a national challenge grant), and has
been a contributing partner on an IMLS grant (The Philip S. Hench/Walter
Reed Yellow Fever Collection. He has taught, lectured, and written
frequently for the past twelve years on various aspects of digital library and
humanities computing, in particular the use of SGML and XML texts in
large-scale digital library aggregations.
Emory University
Martin Halbert (Co-Principal Investigator): Martin Halbert has been
Director for Library Systems at the Emory University General Libraries since
1996, and has extensive experience in planning and coordinating metadata
harvesting projects. He is currently principal investigator and executive
director for the projects of the MetaScholar Initiative .
A recognized authority on metadata harvesting services, he has spoken on the
topic at a number of national and international conferences.
University of Michigan
Kat Hagedorn (Technical Manager): Work on metadata standards, interface design and
reporting on current practices will be managed at the University of Michigan by
Kat Hagedorn, the Metadata Harvesting Librarian at the University of Michigan.
She managed the OAI project funded by the Andrew W. Mellon Foundation at the
University of Michigan, OAIster. Her work on OAIster and DLXS will provide the
basis for the development of the new service envisioned in this proposal.
Perry Willett (Co-Principal Investigator): Head of the Digital
Library Production Service, University of Michigan, Perry Willett was before
that Assistant Director for Projects and Services, Digital Library Program
(DLP), Indiana University and Head of the Library Electronic Text Resource
Service (LETRS)
University of Illinois
Thomas G. Habing (Co-Principal Investigator): Work on the
experimental registry of OAI service and metadata providers will be managed at
the University of Illinois by Thomas G. Habing and carried out by a library
school graduate research assistant under his direction. Mr. Habing has been
active as a developer of OAI-related software and services since the alpha
testing phase of the protocol. He is a co-author of an XML schema for Qualified
Dublin Core and the creator of the UIUC experimental registry of OAI metadata
providers (http://oai.grainger.uiuc.edu/registry).
Project Evaluation
Two
advisory committees will be formed early in this research project to assist
with ongoing evaluation and assessment and to help ensure usefulness and
sustainability potential of this work.
The
first, to be chaired by the project director/principal investigator, David Seaman, will be comprised of selected
staff from DLF member libraries and digital projects. As representative of
librarians and content providers, this advisory committee will help set project
priorities and research agendas and will provide guidance on the resolution of
high-level project issues. This first committee also will assess, evaluate, and
provide ongoing qualitative feedback regarding what the research results tell
us about the potential uses of metadata repositories to support the provision
of advanced library services and their effectiveness as utilities for resource
sharing and digital collection interoperability. Tom Habing (UIUC), Martin Halbert (Emory), and Perry Willet
(Michigan) have all committed to serve on this first advisory
committee. Additional members will be selected
at project start to fill out this group. Members will be selected for their
familiarity with digital library systems and interoperability issues, and to
represent a diverse set of subject domains and digital information resource
knowledge. We anticipate that this committee will meet face-to-face twice
annually during the project, and will confer by email and conference calls as
necessary at other times during the project.
A
second advisory committee will be comprised of approximately 8 DLF member
institution teaching faculty who are well positioned to speak to scholarly user
needs and interests, across all levels of university constituencies (i.e.,
undergraduate, graduate, faculty, and staff). This committee as well will
advise on project priorities and agendas on an ongoing basis, but this second
group also will be especially well-positioned to comment from an end-user
perspective on outcomes of prototyping activities and usability testing of
experimental systems and services developed. The Scholarly Advisory Committee will
serve not only as a key resource for evaluation but also as an aid to
dissemination of results and as a group that can help facilitate sustainability
and transformation of experimental services developed here into the longer term
DLF distributed library. In this role they will convene an end-of-project
conference to disseminate and research results and solicit user community input
into next steps. The Scholarly Advisory Committee will also contribute
substantial subject expertise during the content selection/collection
development phase of the project.
Dissemination
The DLF has both its semi-annual Forums and its listserv, Dlf-announce, for
dissemination, and we will be publishing several reports out of this project. In
addition, we will actively seek out appropriate electronic forums, such as
listservs and online discussion lists, in which to alert the wider library and
museum communities of the progress of this project. The participants fully
intend to generate journal articles, conference papers, and avail themselves to
professional presentations designed to disseminate the findings of the project.
Sustainability
The ultimate goal of creating a finding system for
DLF holdings is to research the usefulness and viability of sharing
collection-level and item-level metadata in the context of digitization
projects within the DLF. The endeavor will accomplish those goals and will
lay the foundation for further exploitation of these technologies and
approaches.
All software, documentation, training
modules, and best practice recommendations developed for the distributed
library service will be publicly available.
With regard to item-level metadata
sharing, a primary objective of the proposed work is to engender a commitment
on the part of DLF institutions and others to implement and maintain metadata
provider services so that metadata may be harvested not only by other DLF
members but by all such interested parties.
The DLF is committed to long-term
research, creation, and support of a range of elements that will go to make up
the open, distributed library that we are ambitious for, with the richer
services and better scholarship that will engender.
Conclusion
Recently, the DLF’s Steering Committee has unanimously renewed the organization’s original
commitment to an open, distributed, digital library, which means that we start
this research at a moment when the DLF membership is prepared to commit to
build more OAI records, fund related initiatives, and to implement and maintain
a permanent finding system (a prototype of which this grant will provide); it
also means that we are avaricious for research and demonstration that -- for
example -- encourages new behaviors such as the routine provision of
publicly-available item-level metadata records as part of our daily production
processes, and that provides a richer knowledge of how users expect finding
systems to behave.
The DLF members are
committed to developing the prototype into a large-scale, long-term tool for
discovery and re-use of their rich but scattered digital holdings. This in
turn, we hope, quickly expands beyond the DLF to much wider participation, and
we will do everything we can to promote, publicize, and empower larger
distributed libraries.
During this
research period we will coordinate closely with our colleagues in other
large-scale library aggregations, especially the National Science Digital
Library (NSDL), and with ongoing research at UIUC. Already we see ways in
which the practical experimentation undertaken here will inform the emerging
discussions about national cyber-infrastructures for both the sciences and the
humanities, and will contribute to the discussions about a National Digital
Library for the Humanities.
The Institute of Museum and Library Services, a federal grant-making agency
dedicated to creating and sustaining a nation of learners by helping libraries
and museums serve their communities, supports the Digital Library Federation.
return to top >> |