random quote Link: Publications Forum Link: About DLF Link: News
Link: Digital Collections Link: Digital Production Link: Digital Preservation Link: Use, users, and user support Link: Build: Digital Library Architectures, Systems, and Tools
photo of books






Please send the DLF Director your comments or suggestions.

Registry of Digital Reproductions of Paper-based Books and Serials

Functional Requirements

July 24, 2001


This document offers a functional specification for a registry that records information about digital reproductions of books and serials. It has been produced as part of the DLF's work to define requirements for and encourage the development of such a registry service. An introduction to that work and additional documentation on it is available from http://www.diglib.org/collections/reg/reg.htm.

Data registered

The Registry must have the ability to record (or, in some instances, to link to) the following information:

Bibliographic Description. Reproductions should be described using MARC21 and contemporary cataloging rules. Given that the original materials are very likely already cataloged in traditional library format, this should be an easy and inexpensive process. Records should meet the standards for minimal content described in National Level Record - Bibliographic - Full Level & Minimal Level (http://www.loc.gov/marc/bibliographic/nlr/).

Precise Holdings. For multi-volume and "continuing" works such as journals, a description should be recorded in standard holdings notation of the precise issues and volumes that have been digitized. If forms of "compressed" holding statements are used, they MUST be understood to imply that every volume and issue of the encompassed run has been digitized.

Information About Use Copy. The following information should be recorded about the use copy (a network-accessible, but not necessarily free, copy of every registered object must be available as a condition for inclusion in the Registry):

  • a URL or URN providing a persistent link to the use copy;
  • a textual description of the terms and conditions for access to the use copy;
  • the technical format, if the materials are not simply available through a standard web interface.

Information About Archival Master Copy. The following information should be recorded about the archival master copy (because persistence is an assumed responsibility of the registering agency, a Master Copy must be described):

  • a persistent identifier for the master object (this does not need to be an "actionable" link such as an URL or URN -- just an unambiguous identifier that the owner will recognize);
  • a textual description of the accessibility of the Master Copy (who can access it and under what terms and conditions);
  • a description or a pointer (such as a URL or a standard identifier) to a description of the technical standards used in creating the Master Copy (note that this is a key element - it is expected that materials will be digitized following many differing practices. In order for other institutions to rely on a master, they need to be satisfied that is of sufficient quality. The expectation is that there will be standard best practice guidelines created by the digital library community that libraries can simply "point" to via identifier or URL when appropriate. When community standards or best practices are being followed, it is highly desirable that the name or identifier of such practices be recorded rather than a URL-based link.);
  • the technical format of the Master copy (again, this is likely recorded as a pointer to a description of the format used).
  • a description or a pointer to a description (such as a URL or a standard identifier) of the repository practices being followed in the storage and maintenance of the Master Copy (as noted efforts are currently underway to define best practices in this area.).

    If URLs are used to point to external descriptions of practice in any of the fields above, the recording institution must assume the responsibility of maintaining the validity of the link.

    (NOTE that if use and archival master copy are the same, the information should be repeated in both areas.)

Statement of intent to digitize. In order to avoid unnecessary duplicate conversions, libraries are encouraged to record their intent to digitize an object as soon as the definite decision to do so is made. The statement should include the projected date on of completion. The MARC21 583 field can be used to record this information. A problem with the use of a queuing mechanism of this sort for microfilming activity has been that not everything queued has subsequent updates to indicate that the filming has actually taken place. In order to avoid this difficulty in the digital registry:

  • a name and contact information should be recorded for each queued item;
  • the registry system should send a tickler message for each item that remains queued more than 90 days after the expected date of digitization;
  • the registry should delete any queuing information more than a year past the expected date of digitization.

Information from Multiple Sources

There will be many cases in which more than one institution will need to register digital copies of the same bibliographic item. In particular, one can expect that for multi-part items such as journals, the entire bibliographic item may not be available from a single institution, and that the record in the Registry should show that one institution has digitized some volumes, and another institution other volumes. Likewise there may be instances in which two institutions digitize the same item in different formats or to different standards. The Registry should provide a unified and coherent view of all digital versions registered


Registered information must be available to users in two ways:

Interactive search. Records of registered materials must be interactively searchable through standard information retrieval queries based on normal MARC bibliographic elements. This search facility is primarily intended for use by library staff looking to see if a known item has been digitized If registered materials are integrated into a larger bibliographic file, searching must provide the ability to limit results to registered digital copies. Because searching is only intended to support library staff, a naïve general user interface is not required.

Harvesting. Registered catalog data must be available for harvesting, supporting at a minimum the Open Archives initiative (OAi) protocols. Metadata formats supported should include both MARC21 and Dublin Core. There is no specific requirement that sub-setting of registered data during harvesting via the OAi "set" functions be supported.

Easy accessibility should be provided to the entire international library community as well as to other institutions and companies (both commercial and non-commercial) engaged in digital conversion efforts.

Visibility as entity

It is important that the Registry be visible and have a recognized name to encourage contribution and use. While it is highly desirable that registered data be accessible as part of a larger bibliographic file, some means of identifying the Registry (including, but not limited to, the ability to access only registered materials in searching and harvesting as discussed above) and making its utility visible should be provided.

Input and Maintenance

Data input and maintenance should be available through both interactive on-line transactions for individual records, and in batch mode for groups of records. A "derive" function, allowing the majority of the bibliographic description of materials to be copied from existing records for paper originals, is highly desirable. It is expected that the original registering institution will need to be able to update information in the Registry after initial input, to record such things as a change in status from intended to actually digitized, additional volumes digitized, and changes in the format of master or use copies. Additionally, other institutions will need to be able to add information about other digital versions created. The ability to contribute and maintain data should be easily and readily available to the entire library community and to other institutions and companies (both commercial and non-commercial) engaged in digital conversion efforts.

return to top >>