DLF logo DLF logo

DLF Home

About

Architectures, systems and tools

Digital preservation

Digital collections

Standards and practices

Use and users

Roles and responsibilities

DLF Forum

Publications and resources

Draft benchmark for digital reproductions of printed books and serial publications

30 July 2001

This document recommends a minimum benchmark for digital reproductions of printed texts and serial publications. It also outlines the importance, rationale, and implications of such a benchmark.

Work defining the benchmark grew out of the DLF's investigation into the need for and functional specification of a service through which libraries could register information about the digitally reformatted book and serial publications they had produced (see http://www.diglib.org/collections/reg/reg.htm).

Although the registry service envisaged by the DLF is not exclusive (it will be able to record information about the large and valuable legacy of digitized books and serials) its existence and use will provide an opportunity to identify and build consensus around minimum characteristics that might be expected generally of a faithful reproduction created from this day forward. The benchmark recommended in this document is intended, in part, to launch that consensus-building process.

The recommendation has been prepared by a working group of the Digital Library Federation (DLF) and is being circulated to DLF member institutions for their review, comment, and ultimate endorsement (a report of the group's work is available from the DLF's website).

The review period will last three months from 31 July to 31 October, 2001 during which time comments should be sent to dlf@clir.org.

Should the recommendation prove acceptable to the DLF membership, the benchmark as revised will be posted on the DLF website and notified to the broader community alongside an indication of the DLF's endorsement.

Contents

  1. What is a preservation digital master
  2. Why is it important to build consensus around a preservation digital master
  3. Rationale behind recommendations pertaining to a preservation digital master
  4. Implementation issues
  5. Appendix. Draft list of structural metadata elements that should be required for preservation digital masters

1. What is a preservation digital master

A preservation digital master is a digital facsimile that is a faithful rendering of a printed text (including texts with illustrations and rare and early printed texts).

A preservation digital master must include digital page images.

The page images of a digital preservation master will have or exceed the following minimum level characteristics
Printed texts (may include simple line drawings, descreened halftones) Illustrated texts. Black and White. Illustrated texts. Color. Rare and early printed texts
600 dpi, 1-bit TIFF image using ITU- T6 compression (may be dithered up from a 400 optical dpi 1-bit image) 400 dpi, 8-bit, TIFF (uncompressed or using lossless compression) 400 dpi, 24-bit, TIFF (uncompressed or using lossless compression) for color illustrations 400 dpi, 8 or 24 bit TIFF (uncompressed or using lossless compression)

Preservation digital masters must have descriptive, structural and administrative metadata, and the metadata must be made available in well-documented formats. Structural metadata must include page level information e.g. as required for page turning and related application software. A minimum list of structural metadata elements is recommended in the appendix.

Preservation digital masters may include machine-readable text as follows:

Either:

uncorrected OCR,

or

corrected OCR that is below 99.995% accurate,

or

corrected text (keyboarded or OCR) that is at or above 99.995% accurate

As well as:

text that is encoded (at any level, e.g. as specified in TEI Text Encoding in Libraries. Guidelines for Best Encoding Practices. Version 1.0, July 30, 1999)

2. Why is it important to build consensus around preservation digital masters

By agreeing to a minimum level benchmark for a preservation digital master, libraries and other organizations can reduce the risk involved in the production and maintenance of digitized texts while inspiring confidence in and encouraging their use.

Because a preservation digital master will be considered by the community as a digital object that is able to meet anticipated current and future needs, an organization creating the preservation digital master can invest in digitization secure in the knowledge that it will not be forced to re-digitize the object at some future date even as production techniques improve.

Users, meantime, will develop confidence in preservation digital masters because they have a minimum level of well-known and consistent properties, and they will support a wide variety of uses (including uses not possible with printed texts).

As access to printed texts shifts increasingly to digital preservation masters and their derivatives, collection managers may begin to investigate alternative means for responsibly and non-redundantly preserving the printed texts (or artifacts) from which they are produced; for example, establishing a network of specialist print repositories.

In particular, by building consensus around the characteristics of preservation digital masters, libraries and other organizations that produce and support access to printed texts will be able more effectively to:

  • Write contracts with vendors who offer digitization service and compare vendors pricing structures - the preservation digital master will be the base level production quality that can be required of vendors and form a baseline for price comparison
  • Commit to making preservation digital masters accessible over the longer term - preservation digital masters will be invested with an intrinsic value that makes them worth maintaining
  • Level up digitization efforts to a point where digital objects are known to have a certain quality capable of supporting production of various derivatives and thus various uses, users, and user needs
  • Instill confidence in users who will know that preservation digital masters support their needs, enable projection and detailed review of anomalies that may exist in the source text, enable print reproduction of quality that is equivalent to or better than that achieved by photocopying directly from the source text.
  • Create objects with optimal and well known processability
  • Define and narrow preservation options, e.g., as may be required to migrate the preservation digital masters through changing technical regimes
  • Motivate investment in digitization as a strategy for managing collections of printed texts
  • Supply guidance to funding and other agencies that invest in digitization or otherwise exercise some strategic influence over the networked scholarly information landscape

It is important also to be specific about what consensus about preservation digital masters will not, should not, and is not intended to do.

  • It is not intended to promote or to define methods for creating digital replacement copies for the source documents. Rather, we see consensus around benchmark preservation digital masters as an essential step that will allow us as a community to implement the means of responsibly and effectively managing, preserving, and conserving our printed heritage.
  • It is not intended as an absolute statement of best practice - one that assumes digitization methods won't improve. Methods will continue to improve and our understanding of best practice will improve with them. That is why preservation digital masters are defined as digital objects with a certain number of minimum level characteristics.
  • It is not intended to diminish the importance, encourage poor management, or force re-scanning of legacy collections that were made at lower than recommended levels. The recommendation takes a prospective look. It suggests that from this date forward, preservation digital masters will have or exceed the minimum characteristics that are documented here.

3. Rationale behind recommended benchmarks

3.1. Book illustrations

  • In considering benchmarks for book illustrations, the illustrations were considered as parts of books rather than as unique objects
  • Book illustrations require different benchmarks than printed text. In particular, they require greater bit depth because
    • o o the print processes used with illustrations offer fine granular detail that can be lost when captured bitonally
    • color cannot be captured with bitonal images

3.2. Rare and early printed materials

  • Rare and early printed materials require different benchmarks than circulating printed texts because
    • The digital preservation masters will be used by scholars and others whose research requires detailed information about the printed material as a physical artifact. As such the digital preservation masters will require greater bit depth
    • The printing used in their production is often very fine and includes a great deal of variation
    • Rare books often contain marginalia or annotations that are best captured tonally.
  • Given the nature of rare and early printed materials and the use to which their preservation digital masters may be put, a case can be made for setting a benchmark minimum level resolution at 600dpi. At the same time, it is recognized that 600dpi may have higher costs with little to no appreciable gain in information capture.For these reasons, institutions may prefer to digitize at lower resolutions, e.g,. to 400 dpi. For these reasons, the benchmark includes both 400 and 600 dpi.

4. Implementation issues

Which benchmark levels are selected and applied in a digitization project, indeed, whether that effort actually digitizes at or above the benchmark level will be determined locally with respect to a number of factors:

  • extent and nature of illustrations in the source material being digitized
  • intended use of the digital preservation master
  • scale and cost of digitization effort

How libraries characterize early and rare printed materials will be a local decision based on local collections and collection expertise. The Rare Books and Manuscript Section of ACRL has issued guidelines on the selection of general collection materials for transfer to special collections that provide useful criteria for determining what constitutes an early or rare printed item:

Books may possess intellectual value, artifactual value, or both. Items with artifactual value include finely printed or bound books, those containing plates, valuable maps, or manuscripts, annotations, drawings or other original art work, including tipped-in photographs, or those published prior to a certain date (e.g., before 1800). Other categories on which there is wide, but not always general agreement, include:


a. fine bindings;
b. early publishers' bindings;
c. extra-illustrated volumes;
d. books with significant provenance;
e. books with decorated endpapers;
f. fine printing;
g. printing on vellum or highly unusual paper;
h. volumes or portfolios containing unbound plates;
i. books with valuable maps or plates;
j. books by local authors of particular note;
k. material requiring security (e.g., books in unusual formats, erotica or materials that are difficult to replace)
l. novels with duskjackets containing important information (e.g., test, illustrative design, and prices).

The rarity and importance of individual books are not always self-evident. Some books, for example, were produced in circumstances which virtually guarantee their rarity (e.g., Confederate imprints). Factors affecting importance and rarity can include the following:

  1. desirability to collectors and the antiquarian book trade;
  2. intrinsic or extrinsic evidence of censorship or repression;
  3. seminal nature or importance to a particular field of study or genre of literature;
  4. restricted or limited publication;
  5. the cost of acquisition.

Appendix. Draft list of structural metadata elements that should be required for preservation digital masters

M = Mandatory; MA = Mandatory if applicable; O = Optional

All Materials

Relationship to Other Resources (MA)

Metadata Locations (M)

Start image/page (M)

End image/page (M)

Monographs

Title page (M)

Copyright page (M)

Table of contents (M)

List of illustrations (O)

List of tables (O)

Beginning segments (e.g., forward, preface, acknowledgements) (O)

End segments (e.g., epilogue, afterword, conclusion, etc.) (O)

Chapters/parts (O)

Notes (O)

Bibliography (O)

Index (M)

Colophon (O)

Errata (O)

Page numbers (M)

Blank page (M)


 

Serials

Entire Publication

Volume (M)

Issue (M)

Supplements (M)

Table of contents (M)

Index (at issue and volume level) (M)

Corrections and retractions (O)

Serial front matter (M)

Serial part (O)

Serial section (O)

Name index (if separate from other index) (O)

Subject index (if separate from other index) (O)

Errata (O)

Page numbers (M)

Blank page (M)

Articles

Article title (O)

Author (O)

Abstract (O)

Date (O)

Tables/figures (O)

Errata (O)

Page numbers (M)

Blank page (M)

 


Please send comments or suggestions.
Last updated:
© 2000 Council on Library and Information Resources

CLIR CLIR Home Page