DLF evaluation of the Open Archives Initiative
The DLF is supporting the development of a small number of Internet gateways through which users will access distributed digital library holdings as if they were part of a single uniform collection. The gateways will be built using a technique known as metadata harvesting. That technique is documented in a technical framework developed by the Open Archives Initiative (OAI). As such, the gateways developed by the DLF will contribute to a practical evaluation of the OAI's harvesting technique and its application within libraries. These pages describe this development work and provide an up-to-date account of its progress.
The evaluation project
Provisional metadata sources
In May 2000, with funding from The Andrew W. Mellon Foundation, two meetings were held at Harvard University to explore various technical and organizational issues involved in the development of metadata harvesting services in digital libraries. The meetings' aims are set out in a funding proposal that also acted to brief invited participants and to focus their discussions.
Participants quickly concluded that there were numerous, potentially very valuable harvesting applications in the digital library. Rather than re-invent a harvesting protocol, however, participants agreed to concentrate on desirable revisions to the protocol that had been developed recently by the OAI and documented in the Santa Fe Convention. The meetings produced three outcomes: a vision statement describing how harvesting services could be developed to the advantage of libraries and their patrons; a set of recommended changes that were formally put to the OAI; and a road map for the development of harvesting services that would help the library community evaluate the OAI's technical framework in particular and the potential value of metadata harvesting in general.
The vision statement was produced in printed and electronic forms, and circulated widely to a broad cross-section of the library community. The statement reflected on libraries' persistent concern to pool the records they had developed in order to document their respective holdings. It evaluated the various mechanisms that had been used to achieve this shared aim (e.g. union catalogs and distributed search services) and the difficulties that those mechanisms encountering in trying to integrate records pertaining to digital as well as non-digital information. Metadata harvesting, the statement suggested, promised to overcome some of these limitations. In addition, the vision statement demonstrated how harvesting could support the construction of Internet portals or gateways; that is, websites that organize access to a rich variety of information resources (potentially in any format) to meet very specific user needs. Thus, the vision statement mused about harvesting services that organized access to information relevant to those interested in a particular field of study (e.g. American history, biomedical ethics), a particular kind of information (electronic books, digital images, maps and cartography), or information available in a certain region (e.g. the southwestern United States). It also envisaged Internet search services, equivalent to those offered commercially, for example, by Alta Vista and Lycos, but focusing more exclusively on scholarly information including that which exists in databases and which is as such hidden from the commercial search engines' view. In this regard, the statement suggested that the harvesting technique could be used to build what members of the Association of Research Libraries were at that time beginning to refer to as "the scholarly commons".
The second result of the Harvard meetings was a set of three recommendations that were taken formally to the OAI. These were intended to help the OAI generalize the framework so that it could be applied beyond the e-print community where it had originated. The first recommendation was technical and urged adoption of unqualified Dublin Core as the protocol's common metadata element set. The OAI had originally proposed an Open Archives Metadata Set that was smaller and more prescribed than the Dublin Core, and more closely tailored to the needs of the e-print community. The second recommendation was greater organizational stability for the OAI, including a steering committee, an official home for the OAI web site, and a clear locus of responsibility for maintenance of the protocol. The aim here was to stabilize the framework long enough to encourage institutions to invest in its practical application. The third recommendation sought to generalize the initiative by focusing it on technical rather than operational issues. Hitherto, the technical framework had been developed to support a particular application that aimed at making electronic pre-prints publications more widely accessible and without cost to end-users. Participants in the Harvard meetings envisaged (and did not want to constrain development of) applications that reflected very different organizational and business objectives. These three recommendations were among those discussed by the Open Archives Initiative at its second public meeting in San Antonio, Texas, in June 2000, and helped to encourage developments that are reported elsewhere from these pages.
The road map for developing harvesting services involved progress on two closely related fronts: developing a pool of harvestable metadata focusing principally on metadata available from library systems; and building a small number of online services with metadata harvested from the pool. The work is being undertaken in close collaboration between the DLF (whose 25 member libraries share an interest in integrating access to their distributed collections) and The Andrew W. Mellon Foundation. An account of its progress is set out below.
In June 2000 the DLF began construction of a simple database to list its members' nearly 300 public domain online digital collections. The database, available from http://www.hti.umich.edu/cgi/d/dlfcoll/dlfcoll-idx, was developed in part to identify sources of harvestable metadata upon which the prototype harvesting services might rely.
Also in June, the Andrew W. Mellon Foundation invited the DLF to locate institutions interested in contributing to evaluation projects either by contributing metadata or by harvesting metadata and building services. A call for expressions of interest issued by the DLF in July produced 13 responses. Responses described 9 or 10 potential harvesting services. They also offered metadata from nearly 50 digital library collections representing between well over a million unique information objects.
In October 2000, a meeting of interested project participants was convened by The Andrew W. Mellon Foundation to explore technical, organizational, and resource issues and to identify possible next steps. The meeting is reported fully elsewhere. Briefly, participants identified at least four service types with particular possibilities for libraries and including:
There was finally interest in using harvesting services to integrate information about digital as well as non-digital objects and in this way, to capitalize on the substantial scholarly wealth represented, for example, in union bibliographic databases and online archival finding aids.
At present, those who have offered metadata are building OAI conformant services that will allow them to make the metadata available for harvesting. A list of metadata collections provisionally to be made available for harvesting is supplied below. In the meantime, discussions with potential service developers continue.
In June 2001, The Andrew W. Mellon Foundation funded seven projects that proposed construction of various online services employing the OAI metadata harvesting protocol. The following services were funded at DLF member institutions:
Many other DLF members are contributing metadata to these harvesting services.