From innovation to best practice
Framing a DLF-NISO initiative to identify, document, and evaluate digital libraries' applications of various data and metadata formatsD Greenstein
13 April 2000
IntroductionThis document summarizes numerous discussions about an initiative to inform digital libraries with regard to their application and use of various data formats and metadata schemes. The initiative takes on a high priority for DLF member institutions and reflects two of their concerns:
There is also hope that in selected cases and over the longer term, such an initiative will exercise some influence over how various data formats and metadata standards are applied by data creators who supply content into the digital library's "collections".
This document outlines six steps that have emerged from discussions and supplies a draft problem statement intended to launch work in this area.
II Next steps
Step 1. Agree on a problem statement that can focus the initiative
A draft is supplied here for consideration and comment. It represents a starting point only.
In a digital age, the library develops services that support the location, retrieval, exploitation and, in some cases, long-term management of deeply heterogeneous networked information resources. Some of these resources are produced by the library (e.g. digital surrogates, finding aids and catalogues). Most are produced by third parties over which the library exercises little or on control.
In developing these various services, the digital library contends with a wide range of data formats and metadata schemes. Diversity is not tied exclusively to the existence of different data formats (e.g. raster graphics, ASCII texts, GIS) and metadata schemes (e.g. the TEI Header, the VRA core, the FGDC) but to the fact that there are few common implementations of any single format or scheme. Instead, implementations are tied to the needs and interests of particular data producers and/or the end-user communities they supply. They rarely take account of the library's needs as organizations that are responsible for a variety of mediating services.,
The impact on the library is considerable. At present it tends to operate on an ad hoc basis by tailoring its various mediating services for individual networked resources as they are included in its "collection". Another more cost-effective and scaleable approach - one that relies on the application of generalizable digital library services and tools - requires a greater degree of consistency across networked information resources than currently applies.
The initiative proposed here does not suggest that consistency will be achieved through rigid adherence to prescriptive data and metadata standards. Rather, it suggests that digital libraries must make informed decisions about how to implement such standards locally in the search, retrieval, and other mediating online services that they need to develop. In this respect, digital libraries are unlikely to advocate a specific implementation of the VRA core. They may none the less offer guidance with regard to some very minimum requirements that will enable the library to fulfill its mediating functions for networked information resources that implement the VRA Core. In selected cases, it may be possible for libraries collectively to come to some agreement with regard to good or even best practice and to use that consensus to exercise some influence over the data creators upon which the library increasingly depends for its "collections".
Working with data and metadata formats that are commonly encountered by digital libraries, the initiative proposes to identify and share information about current practice and to evaluate that practice from the perspective of a digital library having cost-effectively to offer a range of mediating online services. At a minimum, the initiative will develop decision tools that will inform individual libraries as they develop services capable of managing networked information resources that exploit various data formats and metadata schemes.
While evaluating practice, the initiative may, in certain cases be capable of identifying consensus that surrounds "best practice", and to use that consensus to exercise some influence over data producing and data using communities.
Step 2. Convene a small group willing to guide the initiative and launch off on the following immediate tasks:
Step 3. Develop an inventory of documented practice or implementation guidelines that exist for digital libraries with the selected data formats and metadata schemes
I have identified a number of local implementation guidelines already for various metadata schemes e.g. EAD, the VRA Core, the Dublin Core, and the TEI, the DDI Codebook (social science data). There is also some very good work out there reviewing various image formats and their use, and some emerging material relevant to GIS. One wonders whether it might be worth assembling an inventory that might help inform the work of any expert working groups that get going.
Step 4. Establish some guidelines that might be used by any expert working groups that we convene to critically evaluate that practice with a view to developing decision tools and, where possible, identifying good or best practice.
Step 5. Establish some dissemination path that ensures that any work is reviewed and validated by the broadest relevant community
This will be essential in those [few?] areas where we find opportunities to move beyond the development of decision tools to recommending good or best practice.
Step 6. Convene expert working groups as appropriate