D Greenstein
13 April 2000


This document summarizes numerous discussions about an initiative to inform digital libraries with regard to their application and use of various data formats and metadata schemes. The initiative takes on a high priority for DLF member institutions and reflects two of their concerns:
  • reduce the uncertainties that surround digital library investments;
  • to learn from one another's experience in managing a growing number of data formats and metadata schemes in an integrating digital library service environment.

There is also hope that in selected cases and over the longer term, such an initiative will exercise some influence over how various data formats and metadata standards are applied by data creators who supply content into the digital library's "collections".

This document outlines six steps that have emerged from discussions and supplies a draft problem statement intended to launch work in this area.

II Next steps

Step 1. Agree on a problem statement that can focus the initiative

A draft is supplied here for consideration and comment. It represents a starting point only.

In a digital age, the library develops services that support the location, retrieval, exploitation and, in some cases, long-term management of deeply heterogeneous networked information resources. Some of these resources are produced by the library (e.g. digital surrogates, finding aids and catalogues). Most are produced by third parties over which the library exercises little or on control.

In developing these various services, the digital library contends with a wide range of data formats and metadata schemes. Diversity is not tied exclusively to the existence of different data formats (e.g. raster graphics, ASCII texts, GIS) and metadata schemes (e.g. the TEI Header, the VRA core, the FGDC) but to the fact that there are few common implementations of any single format or scheme. Instead, implementations are tied to the needs and interests of particular data producers and/or the end-user communities they supply. They rarely take account of the library's needs as organizations that are responsible for a variety of mediating services.,

The impact on the library is considerable. At present it tends to operate on an ad hoc basis by tailoring its various mediating services for individual networked resources as they are included in its "collection". Another more cost-effective and scaleable approach - one that relies on the application of generalizable digital library services and tools - requires a greater degree of consistency across networked information resources than currently applies.

The initiative proposed here does not suggest that consistency will be achieved through rigid adherence to prescriptive data and metadata standards. Rather, it suggests that digital libraries must make informed decisions about how to implement such standards locally in the search, retrieval, and other mediating online services that they need to develop. In this respect, digital libraries are unlikely to advocate a specific implementation of the VRA core. They may none the less offer guidance with regard to some very minimum requirements that will enable the library to fulfill its mediating functions for networked information resources that implement the VRA Core. In selected cases, it may be possible for libraries collectively to come to some agreement with regard to good or even best practice and to use that consensus to exercise some influence over the data creators upon which the library increasingly depends for its "collections".

Working with data and metadata formats that are commonly encountered by digital libraries, the initiative proposes to identify and share information about current practice and to evaluate that practice from the perspective of a digital library having cost-effectively to offer a range of mediating online services. At a minimum, the initiative will develop decision tools that will inform individual libraries as they develop services capable of managing networked information resources that exploit various data formats and metadata schemes.

While evaluating practice, the initiative may, in certain cases be capable of identifying consensus that surrounds "best practice", and to use that consensus to exercise some influence over data producing and data using communities.

Step 2. Convene a small group willing to guide the initiative and launch off on the following immediate tasks:

  1. establish criteria against which current practices may be evaluated. Here we are likely to emphasize the digital library's unique perspective as an operational service environment that is required cost effectively to develop a range of search, retrieval, data management, and other mediating services for a wide range of extensively distributed and deeply heterogeneous networked information resources. We may also wish to emphasize how a digital library's use of any one data format and metadata scheme may vary with respect to the service context in which it is applied. Thus, TIFF images with specific attributes may be appropriate as archival or master copies but less appropriate for web-based access.
  2. select data formats and metadata schemes where, we feel, it will be possible to invest our efforts. Here, we may wish to distinguish between data formats and metadata schemes with which the library community already has extensive experience (e.g. ASCII texts, EADs, TEI headers) and those where current practice is itself an act of experimentation;
  3. identify individuals who may be able and willing to participate in expert working groups convened to review practice with selected data formats and metadata standards. The composition of these expert groups will probably need to be considered on a case-by-case basis. In general, one suspects they will want to include experts drawn from data creating, user, and other communities that have a stake in the data format or metadata scheme under consideration. Such inclusiveness is essential if digital libraries' needs and preferences are to have any influence over practices that evolve outside the library community. It will also be important where organizational support is required for the maintenance work that may be associated with emerging good or best practices.
  4. review any outputs arising from those expert working groups and advise on how those outputs are disseminated.

Step 3. Develop an inventory of documented practice or implementation guidelines that exist for digital libraries with the selected data formats and metadata schemes

I have identified a number of local implementation guidelines already for various metadata schemes e.g. EAD, the VRA Core, the Dublin Core, and the TEI, the DDI Codebook (social science data). There is also some very good work out there reviewing various image formats and their use, and some emerging material relevant to GIS. One wonders whether it might be worth assembling an inventory that might help inform the work of any expert working groups that get going.

Step 4. Establish some guidelines that might be used by any expert working groups that we convene to critically evaluate that practice with a view to developing decision tools and, where possible, identifying good or best practice.

Step 5. Establish some dissemination path that ensures that any work is reviewed and validated by the broadest relevant community

This will be essential in those [few?] areas where we find opportunities to move beyond the development of decision tools to recommending good or best practice.

Step 6. Convene expert working groups as appropriate

