Developing Sustainable Digital Library Collections: Strategies for Digitization

Unedited draft for exclusive use of participants in the DLF Spring 2001 Forum


Abby Smith

Table of Contents

I. Introduction

II. Identification, Evaluation, and Selection

III. Institutional Impacts

IV. Summary and Conclusion

V. Sources


Libraries have been digitizing collections for a decade or more. The collective experience of these libraries has produced a depth of technical expertise and a set of reliable practices that are widely shared among digital library staffs and well-reported in a number of meetings and publications. A decade into this ongoing experiment with representing research collections online, technical practices have begun to codify and trends in selection policies have emerged. This paper will review existing selection practices in libraries that have significant digitization projects underway, identify selection policies and best practices where they exist, and discuss the long-term implications of the opportunities and constraints that shape digital conversion programs. This will not be a systematic review of what all research libraries are doing, but rather an analysis of significant achievements to date in order to identify good practices and benchmarks for success. Every library, regardless of size or mission, will need to determine for itself how and when digitization will move from being an experiment to becoming a collection development strategy well integrated into the daily practice of a library.

For the purposes of analysis, the study focuses on a subset of "first-generation" digital libraries, those that have been engaged in significant digitization projects for a while. Research was conducted by studying the Web sites of all Digital Library Federation members, as well as many other libraries and research institutions engaged in putting collections online. More important for analytical purposes were the site visits made to selected libraries-the University of Michigan, Cornell University, the University of Virginia, New York Public Library, New York University, and the New-York Historical Society. Some among them are so-called first-generation digital libraries and others are not, just as among them there are great differences in governance and funding, with some libraries within public universities, some within private, and some that are independent of an academic institution. These differences are fully reflected in the various approaches they take to selecting what to digitize, how to do so, and for which audiences.

Each library was given a set of questions about selection criteria that constituted the framework for investigation and each institution organized its responses individually. (The Library of Congress, also included in this study, answered the questions in writing and no site visit was made.) The questions begin with the selection process and proceed through the creation of metadata, decisions about access policies, and use support systems.


While the great majority of research libraries have undertaken digitization projects of one type or another, a few have reached the point where they are moving forward to developing full-scale digitization programs, rather than just a series of projects. How have they conceptualized the place of digitized collections in the provision of collections and services to their core constituencies? What have they done or have determined must be done in order to move from project-based conversion to full-blown programs? How are they developing programs that are sustainable over the long term and scaled from project to program?

This report works from the assumption that to be sustainable, digitization programs should have certain intrinsic features.

A sustainable digitization program, then, would be fully integrated into traditional collection development strategies. Therefore, the assessment of what libraries have achieved so far is based on a close examination of two key factors common to sustainable collection development, be it of analog, digitized, or born-digital materials.

A strategic view can be revealed in many cases not only by looking at how closely the results serve the mission, but also by the decision-making process itself-who decides what to convert to serve what ends. When are the decisions made primarily by subject specialists based on existing collection strengths, and when is the selection process shaped by the needs of curricular development and other faculty needs? If the latter, then by what process are the faculty involved and how are teaching and research tools developed to meet their needs?

Sustainability of digital collections is dependent on careful life cycle management. How does the library budget for not only the creation of the digital scans, but also for the metadata, storage capacity, preservation tools (e.g., refreshing, migration), and user support, the sorts of things that are routinely budgeted for book acquisitions? How much of the program is supported by grant funding and how much by base funding? If presently grant-supported, what plans exist to make the program self-sustaining?

In the case of selecting for digitization, in contrast to purchasing or licensing born-digital materials, the rationale for expending resources to essentially re-select items already in the library's possession is necessarily more complex. In theory, a library would only choose to digitize existing collection items if it could identify the value that is added by digitization and determine that the benefits outweigh the costs. But in practice, over the past decade the research library community has boldly gone forward with digitization projects without knowing how to measure either cost or benefit. Dependent on a technology that is constantly changing, and with it the costs, budgeting models that make comparison between even similar libraries can be at times meaningless or downright misleading. The only way for many libraries to get at the issue of cost is to undertake projects for their own sake in the expectation that documentation of expenditures will yield some meaningful data. Libraries that have been able to secure funding for projects, carefully document their activities and expenditures, and share that information with their colleagues have emerged as the undoubted leaders of the community if only because of their policies to share their knowledge. Their experiences are necessarily more relevant for this report than that of others who have embarked on fewer projects or have failed to document and share their knowledge.

Beside cost, the other unknown factor in this first decade has been the benefit-the potential of this technology to enhance teaching, research, lifelong learning, or any number of possible goals that digitization is intended to achieve. How could we know in advance how this technology would be adapted by users other than ourselves? How could we conceptualize use of these digitally reborn collections except by extrapolating what we know from the analog realm? Regrettably, most academic institutions, despite their clearly stated goals of improving or at least enhancing research and teaching, have done less than they might have to gather meaningful data about the uses of digitized collections. While this report will address issues of costs and benefits, it should be remembered that as a community, we still have insufficient data on which to draw firm conclusions and derive recommended practices.

The report aims to synthesize experiences in order to identify trends, accomplishments, and problems common to all libraries and indeed many cultural institutions when they represent their collections online. A brief review of rationales for digitization is followed by consideration of the various ways that digitization activities affect an institution. What consequences, intended or not, result from selection decisions? And what factors, such as funding, may set constraints on decision-making? One of the chief factors influencing selection decisions is copyright. This topic will not be explored here in any detail because of its complexity, but the consideration that librarians must give to rights management is often cited as the first thing that crosses their mind, at times even unconsciously, when assessing collections for digitization. At program planning stages, copyright is often viewed from the point of view of risk management.1 Eliminating materials that are or might be under copyright reduces the risk of infringement to zero.

II. Identification, evaluation, and selection


There has been a great deal of thoughtful literature written on both the subject of selection for digitization and the management of conversion projects. Much of this literature is published on the Web and has become de facto "best practice" to the extent that many institutions applying for digitization grant funds use these documents to plan their projects and develop selection criteria. In addition to these guidelines, there are a number of reports about selection for digitization that range from project management handbooks and technical guides to imaging, to broad, non-technical articles aimed at those outside the library community who fund such programs. [DLF/RLG imaging guides, NEDCC, Smith, Gertz, de Stefano, Kenney and Reiger, LC ]

In contrast to these numerous and widely available documents, very few libraries have formal written policies for conversion criteria. Those that have written documents tend to refer to them as guidelines, and they tend to be focused on technical aspects of selection and even more on project planning. When asked why an institution does not have a policy, the response is either that it is too early to formulate policies, that they have not gotten around to formulating them, or that the institution does not have written collection development policies and so is unlikely to write them for digitized collection development.

The focus of these documents is almost invariably not on the rationale for digitization, but on the planning of digital projects or various elements of a larger program. The University of Michigan, for example, does have a written policy and it clearly aims to fit digitization into the context of traditional collection development. [citation] It states that "Core questions underlying digitization should be familiar to any research library collection specialist.

"Is the content original and of substantial intellectual quality? Is it useful in the short term for instruction and in the long term for research?
Does it match campus programmatic priorities and library collecting interests?
Is the cost in line with anticipated value?"
Does the format match the research styles of anticipated users?
Does it advance the development of a meaningful organic collection?"

These are fundamental collection development criteria that assert the importance of the research value of source materials over technical considerations, but they are quite general. The rest of the policy focuses not on how to select which items from among the millions in the collection have priority for conversion, but rather how anticipated use of the digital surrogates should affect decisions about technical aspects of the conversion, mark-up, and presentation online.

The selection criteria developed for Harvard offer far more detailed considerations.[citation] In common with the Michigan criteria, though, they focus largely on questions that come after the larger "why bother to put this in digital form rather than that" issues that have already been answered. Creation of digital surrogates for preservation purposes is cited as one legitimate reason for selection; so are a number of considerations aimed not at preservation, but solely at increasing access. (Sometimes, of course, digitization does both at once, as in the case of rare books or manuscripts.)

The Harvard guidelines have proven to be useful to many beyond Harvard in planning a conversion project because it presents a matrix of decisions that face selectors and is readily available on the Web. [citation to Indiana article and sites for report] The authors begin with the issue of copyright¾whether or not the library has the right to reformat items and distribute them either in limited or unlimited forms. They then ask a series of questions derived from essentially two points of departure:

Source material: does it have sufficient intellectual value to warrant the costs? Can it withstand the scanning process? Would digitization be likely to increase use? Would the potential to link to other digitized sources create a deeper intellectual resource? Would the materials be easier to use?
Audience: who is the potential audience? How are they likely to use the surrogates? What metadata should be created to enhance use?

The answers to these questions should guide nearly all of the technical questions related to scanning technique, navigational tools and networking potential, preservation strategy, and user support.

The primary non-technical criterion-research value-is at heart a subjective one and relies on many contingencies for interpretation. What does it mean for something to have intrinsic research value? Are there many items collected by research libraries that do not? Should we give priority to items that have research value today, or those that will (probably) have it tomorrow? What relationship does current demand have to intrinsic value? Because these are essentially and unavoidably subjective judgments, the only things excluded under these selection criteria are things that do not fit under the camera, like architectural drawings, or things that are very boring or out of intellectual fashion. Interestingly, foreign language materials are nearly always excluded from consideration, even if they are of high research value, because they are deemed not cost-effective to convert. There are digital projects that have converted valuable historical sources, from Egyptian papyri to medieval manuscripts, into image files (such as the Advanced Papyrological Information System [APIS] and the Digital Scriptorium). But the conversion of non-English language sources into searchable text continues to be rare.

This high-level criterion of research value is also an intrinsic part of non-digital collection development policies. The difference in how the two play out in most libraries is that the acquisition of monographs, to take an example, fits into a long-standing activity that has been well defined by prior practice. And it governs the acquisition of new materials, those that are not already held by the library. (The issue of how many copies is secondary to the decision to acquire the title.) Selection for digitization is re-selection, and so the criteria for digitization, or repurposing, will be different in the end. The meaning of research value will also differ, as we see the methods of research and the types of materials that are mined and how, are really fundamentally different from analog items. Several large digitization programs today are grounded in the belief that it is the nature of research itself that is "repurposed" by this technology, and the source material that yields the greatest return when digitized often surprises.

As one librarian said, the guidelines addressing selection that exist and are used routinely, whether they are official or not, are all "project-oriented" and it would be a mistake to confuse what libraries are doing now with what libraries should and would be doing if "we understood what higher purpose digitization serves." While guidelines for technical matters such as image capture and legal rights management are, on the contrary, extremely useful and should be codified, formal collection development policies are still a way off.


Libraries usually identify two reasons for digitization: to preserve analog collections; and to extend the reach of those collections. Most individual projects and full-scale programs, while perhaps giving priority to one over the other, end up serving a mix of both purposes. As librarians have learned from tackling the brittle book problem through deacidification and reformatting, it is difficult and often pointless to pick apart preservation and access. Indeed, when a library is seeking outside funding for digital conversion (apparently still the primary source of funding for many libraries), applicants tend to cite as many possible benefits from conversion as possible, and so preservation and access are usually mentioned in the same breath. Nonetheless, because it has been generally conceded that digital conversion is not as reliable for preservation purposes as microfilm reformatting, it is worthwhile to stop and consider what institutions are doing and saying that they are doing in terms of preservation per se.

II. 2. 1. Preservation

II. 2. 1. 1 Surrogates

The use of scans made of rare, fragile, and unique materials, from print and photographs to recorded sound and moving image are universally acclaimed as an effective tool of preventive preservation. For materials that cannot withstand frequent handling or pose security problems, digitization has proved to be a boon.

II. 2. 1. 2. Replacements

For paper-based items, there is a consensus among librarians that digital scans serve as the preferred type of preservation surrogates. They are widely embraced by scholars and preferred to microfilm. However, with some few exceptions, librarians will also assert that scanning in lieu of filming does not serve preservation purposes, because the expectation that we can migrate those scans into the future is simply not as great as our conviction that we can manage preservation microfilm over decades. There is the general hope, if not certain expectation, that the problem of digital longevity will soon be resolved. In anticipation of that day, most libraries are creating what they refer to as preservation-quality digital masters, a scan rich enough to use for several different purposes and created to obviate the need to scan the original again, often together with preservation master microfilm.

Only one institution-the University of Michigan-has a policy to scan brittle books and use the scans as replacements, not surrogates. They have created a policy for the selection and treatment of these books, and they explicitly talk of digital replacements as a crucial strategy for collection management. [See attached policy in Appendix II.). This policy is premised on their view that books printed on acid paper have a limited life span and that, for those with insignificant artifactual value, they are not only rescuing the imperiled information, but also making it vastly more accessible by scanning in lieu of filming. (The preservation staff continue to microfilm items identified by selectors for filming, as well as deacidifying volumes that are at risk but not yet embrittled.) The focus of Michigan's digital program is the printed record, not special collections, and they have made digitization a key collection management tool for those holdings. At Cornell, which also has incorporated digitization into collection management (that is, it is not for access alone), though not as systematically as Michigan, there is a preference for digital replacements of brittle materials with backup to COM (computer output microfilm) or replacement hard copies made from digital scans. The Library of Congress has also begun implementing preservation strategies based on digitization in the House and Garden project [citation]. Most libraries, though, elide the issue of digitally reformatting brittle materials because they scan chiefly special collections items.

For audiovisual materials, digital replacements appear to be inevitable, although standards for archival-quality re-recording have yet to be established. Because the recording media used for sound and moving image demand regular, frequent, and ultimately destructive reformatting, migrating onto digital media for preservation as well as access is acknowledged to be the only course to pursue for long-term maintenance. UCLA and the Library of Congress, both deeply engaged in audiovisual preservation, are looking to digitization for the long-term access for analog as well as digital materials. This does not mean that these institutions will dispose of the original analog source materials, only that the preservation strategy for these items will not be based on routine use of that analog source material.

II. 2. 2. ACCESS

In nearly all research libraries, digitization is viewed as service of collections in another guise, one that provides enhanced functionality, convenience, certain preservation considerations, aggregation of collections that are physically dispersed, and greatly expanded reach. Among all the various strands of digitization activities at major research institutions, there are essentially three models of collection development based on access: one that serves as outreach to various communities; one that is collection-driven; and one user-driven. All libraries engage in the first kind of access to one degree or another. Where one sees significant strategic differences is in their approach to the choice between mounting large bodies of materials in the expectation of use versus collaborating closely with identified users to facilitate their data creation.

II. 2. 2. 1. Access for outreach and community goals

There are and will continue to be times when academic libraries create digital surrogates of their analog holdings for reasons that are important to the home institution yet not directly related to teaching and research. Libraries are and will continue to be parts of larger communities that look to them for purposes that transcend the educational mission of the library per se. As custodians of invaluable institutional intellectual and cultural assets, libraries will always play crucial roles in fund-raising, cultivating alumni allegiance, and public relations.

Occasions for selective digitization projects include exhibitions, anniversaries (when archives or annual reports get into the queue), a funding appeal (usually as a quid pro quo for donation), and efforts to build institutional identity. Careful consideration needs to be given to what goes online for whatever purpose because, once a collection is online, it becomes de facto part of the institutional identity. Image building is a critical and often undervalued part of ensuring the survival of the library and its host institution. As custodians of the intellectual and cultural treasures of a university, libraries have an obligation to share that public good to the advantage of the institution.

II. 2. 2. 2. Collection-driven selection

Collection-driven selection refers to the process of deciding what to scan on the basis of a set of materials identified by library staff as having great research potential online. The terms "collection-driven," as that of "user-driven," are familiar from the preservation microfilming projects of the 1980s and 1990s. Some librarians decided that books should be selected on the basis not of their documented use (by checking circulation statistics or going with items that cross the circulation desk), but by the coherence of a set of monographs (and occasionally serials) either arranged by subject matter or by date of publication. To extend the use of the term to special collections items, either archival or non-print format, means to choose items that exist within a library as a defined collection by format (all incunabula, all daguerreotypes) or genre of literature (anti-slavery pamphlets, travel literature gathered as a discreet group and held as such in the rare book department, photographs of the Reconstruction-era South given to the library by a donor and known under that name, Sanborn insurance maps, and so forth).

Within each group, libraries attempt to be as comprehensive as possible in putting items from the collection online, to simulate the comprehensive or coherent nature of the source collection. Examples of such collection-driven digital collections are: the Making of America (a subject and time period); Saganet (a set of special collection items held by certain repositories that relate to Icelandic sagas); the Sam Nunn Letters project at Emory University; the Hoagy Carmichael site at Indiana University; and the Scenery Collection at the University of Minnesota.

In a recent survey of selection strategies at 25 research libraries, Paula de Stefano found that "the most popular approach to selecting collections for digital conversion is a subject-and-date parameter approach applied, by and large, to special collections, with little regard for use, faculty recommendations, scholarly input, editorial boards, or curriculum." (de Stefano 2000, 67)

II. 2. 2. 3. User-driven selection

Some libraries have decided that they will not digitize collections for general access purposes, but rather only in response to explicit user-driven needs. At New York University, the focus is on the user as part of a plan to prioritize carefully the relatively modest resources allocated for digitization. Some libraries have decided that they will not digitize collections for general access purposes, but rather only in response to explicit user-driven needs. At New York University (NYU), for example, the focus is on the user as part of a plan that carefully prioritizes relatively modest resources for digitization. Collections of cultural significance are presented through online exhibitions and other modes of Web outreach rather than full collection conversion. This library has decided to concentrate on working with faculty and graduate students to develop digital objects designed to enhance teaching and research through its Studio for Digital Projects and Research and its Faculty Technology Center. An even more pressing need for NYU is the development of an infrastructure to deal with born digital materials and, in an institution with extensive programs and collections in the arts and performance studies, on multi-media archives converted to digital form for presentation. They plan to give grants to faculty for development of teaching and research tools, but at present much of the effort of the library staff is going toward preparing for that time, seen to be in the immediate future, when the demands of born digital materials will obviate any initiative to create large collections of digital surrogates.

Harvard University libraries are taking the same approach, concentrating on building an infrastructure to support born-digital materials first and foremost, rather than building collections of digital surrogates of existing collections. While the holdings of the over one hundred repositories in the university certainly comprise a rich collection of cultural heritage, Harvard will attempt to serve the Harvard community, not necessarily a larger community beyond its campus. [citation to Flecker article] "While in many instances the digital conversion of retrospective materials already in the University's collections can increase accessibility and add functionality and value to existing scholarly resources, it is strategically much more important that the library begin to deal with the increasing flood of materials created and delivered solely in digital format." Although $5M of the $12M allocated is for content development, so far the majority of content development comprises conversion-for-access purposes. Slated for review is the collection of collections that have been mounted so far. "One specific issue being discussed is the randomness of the areas covered by the content projects. Since these depend upon the initiative of individuals, it is no surprise that the inventory of projects undertaken is spotty, and that there are notable gaps....It is also possible that specific projects will be commissioned to address strategic topics." However, the gaps Flecker refers to are not content per se¾specific subjects that would complement one another¾but content that demands different type of digital format¾encoded text, video, sound recordings, etc. This is a technical criterion, of course, independent of collection development, fully concordant with the purposes that Flecker identifies the initiative is to serve.

As a state-supported institution, the University of Virginia (UVA) has developed access projects that serve state and regional needs, based primarily on their special collections holdings. But they also have several digital conversion initiatives that are explicitly user-driven, and these programs exist both in the library and elsewhere on campus. In the Institute for Advanced Technology in the Humanities (IATH) an academic center located in the library but administratively separate, scholars develop deep and deeply interpreted and edited digital objects that are, by any other name, publications. Examples include the projects on the writers Blake, Rossetti, and Twain, as well as the Valley of the Shadow Civil War site. Within the library there is the Electronic Text Center, where the staff will choose to encode humanities texts that they put up without the interpretive apparatus of the IATH objects. They are more analogous to traditional library materials that are made available for others to interpret. Except, of course, that encoded text is far more complicated a creature than the OCR'd [Optical Character Recognition] text that other libraries are creating. To some extent, this latter center is, if anything, technology-driven, in that it seeks to pursue the potential of various encoding schemes as part of its explicit agenda.

The Cornell libraries have also tried both collection- and user-driven approaches to selection. In several instances, staff have begun with expressed interests of faculty, say for teaching, and have developed digital collections based on those interests. In each case, though, library staff have expanded their brief from the scholar and augmented faculty choices with related materials. It seems that a faculty member's interests are usually fairly circumscribed and librarians will select a good deal of additional materials on a topic, such as Renaissance art, to add depth to a selection so that the materials offer greater choices to the users. As a result, a selection of materials becomes a collection, with an wider scope of content. Research librarians are used to thinking of collections as being useful to the extent that they offer comprehensiveness or depth. Scholars, on the other hand, will take such comprehensiveness for granted and concentrate instead on making choices and discriminations between collection items in order to build a case for an interpretation. While these two views of collections are complementary, when it comes to selection for digitization they actually create the most difficult choices facing libraries in digitization programs. Selection is an "either/or" proposition. It seldom tolerates "both/and" solutions. Those historians who are working on Gutenberg-e projects sponsored by the American Historical Association are now beginning to encounter the limitations that librarians live with every day. When faced with the opportunity to not only write their text for electronic distribution, but also to present their resources as well through digital surrogates, they find themselves facing dilemmas familiar to digital collection builders everywhere: how much of the source material is enough to represent the base from which an argument was built? How can one select materials that give a sense of the scope of the original from which the scholar made his choices? And why is digitization of even a few core files so expensive?

Many of the scholar-driven projects may be coherent digital objects in themselves. But they would, by library standards, fail the test of comprehensiveness as a collection. Indeed, one could say that the value added by the scholar lies precisely in its selectivity. Some of those projects, most notably the Shadow of the Valley, attempt to bring together materials that complement and enrich each other but do not try to comprehend the great universe of materials that could be considered complementary. These new digital collections are somewhat analogous to published anthologies of primary sources, carefully selected by an individual or an editorial team for heuristic purposes or to serve as supporting evidence for an interpretation. Other projects driven by scholar selection, such as the "Fantastic" collection of witchcraft source materials at Cornell or the Women Writers Resources Project at Emory University, do not claim to have comprehensiveness, but serve rather as pointers to the collection by presenting a representative sampling of it. Yet others, such as the Blake archive or the American Papyrological Information System, serve primarily to collocate items to form a new virtual collection that then serves as a new paradigm of critical edition.

III. Institutional Impacts


Selecting for scanning must include an assessment of the source's physical condition and readiness for the camera. For those items that are rare, unique, fragile, or otherwise of artifactual value, preparation for scanning usually demands the attention of conservators, if only to assess the readiness of a item for the camera. There is often, though, some treatment that is required, from strengthening or mending paper to removing environmental soil. In the case of medieval manuscripts or daguerreotypes, the institution will need to provide appropriate expertise, either from within the ranks of current staff or from a consultant or vendor. This in turn involves more time and resources (for writing contracts, for example, and checking items that return from a vendor), and this can divert resources normally dedicated to other priority conservation work.

The disposition of scanned source materials that are not unique or rare is a challenging subject that most libraries are just beginning to grapple with. When the time comes that digitization is considered an acceptable if not superior alternative to microfilm for preservation reformatting, and those items that can be networked are, what criteria will libraries use to decide what to keep and what to discard? For materials that are rare or unique that question should not arise. But what about back journals that will be available either by a database like JSTOR or American imprints that the University of Michigan and Cornell have scanned and made available without restrictions on their Making of America site? The library community never reached consensus about this issue for microfilming. But twenty years from now, when many scholars may well prefer remote access to these materials to seeking them out from a library, will the library community have developed a collective strategy for preserving a defined number of originals for access purposes and reducing the redundancy of print collections? How will researchers who wish to have access to originals be able to find out where they are and how they can be viewed? Ongoing discussions about registering digitized items in a nationally coordinated database should include consideration of noting the location of a hard copy available for access.

For those certain media, such as lacquer sound discs or nitrate film, the original or master should never be used for access purposes due the extreme fragility of the carrier. Service of those collections should always be done on reformatted (access) media. However, that is an expensive proposition for any library, and there is great resistance to push the costs of preservation transfer onto the user. Regrettably, a great number of recorded sound and moving image resources are played back using the original. Until digitization is an affordable option for access to these media, their preservation will remain at very high risk.


To date, fewer libraries have digitized significant series of books and periodicals than special collections. There are several reasons for this selection strategy that are commonly given, and others that can be inferred.

There has been a preference in digitizing visual resources over textual sources, in part because they work so well online, in part because visual resources do not require the additional expense of OCR or text encoding that add value to textual materials (though creating metadata for visual resources that are not well cataloged can be expensive as well). Printed sources do not require that, of course, but simple page images of non-rare texts do not provide the enhanced access that most researchers want from digital text. Nearly all selection criteria call for a specific additional functionality, such as browsing and searching, from text conversions. There are also a number of commercial interests who are working with publishers or libraries to provide digitized versions of texts that have a potential market, such as Early English Books Online (EEBO).

Another reason libraries select their special collections is simply due to the example set by one of the first large-scale digitization programs, American Memory, which gives preference to rare or uncommon materials over those more commonly held. This selection strategy has been given financial incentive through the line of Ameritech funding that provides the Library of Congress to regrant funds to libraries digitizing similar and complementary collections.

In addition to these considerations, there is a sense that, by digitizing materials that are unpublished or not commonly held, a library can help to build institutional identity. This can be important in encouraging alumni loyalty or in recruiting students. This assumption¾that special collections build institutional identity and general collections do not¾is actually challenged by the success of the Making of America (MOA) projects at Michigan and Cornell, and the texts encoded at UVA. These institutions have achieved considerable renown for their collections of monographs and periodicals. Yet it is reasonable to assume that such massive digitization programs are not easily replicated by institutions with smaller digital infrastructure or by those that are not early adopters of the technology. For those institutions, special collections offer a smaller-scale approach to developing a Web presence.

But, as the two MOA projects highlight, one of the challenges faced by libraries mounting print publications is that of how much is too little and how much perhaps too much. The sense that textual items need to exist in a significant or critical mass online stems in part from the fact that these books and magazines do not have quite the same cultural frisson as Jefferson holographs or Brady daguerreotypes.

III. 3. Special collections

In 1995, at the time that several academic libraries were working together to mount text-based Americana in the Making of America project, the Library of Congress inaugurated its digitization program, American Memory, based on their Americana special collections. The program was ambitious (they targeted 5 million images in 5 years) and has been influential in large part because of the extensive documentation that the library has mounted on its Web site, and to the well-publicized redistribution grants that it gave under its LC/Ameritech funding. The requirements for that grant were based on Library of Congress experience, and the requirements of other funding agencies, including the Institute for Museum and Library Services (IMLS) have been heavily influenced by them. The only other library that has similar collecting policies and a similar governance and funding structure is the New York Public Library, and the digital program they have scheduled to implement over the next few years bears remarkable resemblances to the Library of Congress' s in its ambitious time frame, focus on special collections, and stated goals of access to the general public as high a priority as service to scholars. They share, in other words, the same strategic view of digitization, one well in line with the realities of their roles as public institutions, their audience, collection strengths, and governance.

Indiana University reports that, in the early stages of their digitization program, they used the LC/Ameritech Competition proposal outline to assess the merits of collections for digitization. This led to a canvas of their libraries for "their most significant collections, preferably ones in the public domain or with Indiana University-held copyrights." Then, with (special collection) candidates in hand, they examined them for what they identified as the basic criteria: "the copyright status of the collection; its size; its popularity; its use; its physical condition; [and] the formats included in the collection... and the existence of electronic finding aids."[Brancolini Library Trends Spring 2000 v 48].

There is also evidence that those who depend on outside funding¾the great majority of libraries digitizing collections¾believe that it is easier to raise funds if they propose to digitize special collections because they are more interesting and have greater appeal to the funders. This hypothesis is untested¾although MOA has received major grant funds, so perhaps it has been tested and proven untrue¾yet this notion of the funding bodies' predilection for special collections continue to have the power to persuade.

While there are lots of reasons that academic libraries decide to digitize special collections, the rationale of the two public institutions merit special consideration. The two public libraries base their selection decisions on their understanding that they are not libraries within a specific academic community, with faculty and students to set priorities. Rather, they serve a very broad and often faceless community¾the general public¾and so wish to make available things that both scholars and a broader audience would find interesting. Because their primary audience is not academic, they have no curricular or educational demands to meet. They can focus directly and exclusively on their mission as cultural institutions. Moreover, as libraries that have rich cultural heritage collections held in the public trust, they feel obligated to make those unique, rare, or fragile materials that do not circulate available to patrons not able to come to their reading rooms. Their strategic goal, then is cultural enrichment of the general public. None of the research libraries with comparable collections, such as the Harvard and Yale University libraries, claim that as a goal. And yet, as de Stefano points out, there are academic libraries who are mounting special collections of broad public appeal, not matched to curricular needs (de Stefano 2000, 67). She cautions that "It is only a matter of time until the question emerges as to how long the parent institutions will be satisfied with supporting the costly conversion of their library's materials to improve access for narrowly defined audiences that may not even be their primary local constituents." This anxiety has been echoed on several campuses in the conversations held during research for this paper.


Digitizing either general and special collections presents challenges around size: how many items from any given collections will be sufficient to create added value. "Critical mass" is one criterion for selection that shows up in nearly all the written guidelines for selection and is commonly noted in conversation. The magic of critical mass, in theory, is that if you get enough related items up in a commonly searchable database, then you have created a collection that is richer in its digital instantiation than in analog. This is premised on the notion that the technology has a transformative power, that it can not only re-create a collection online, but give it new functionality, allow for new purposes, and ultimately create new audiences that put to it novel queries. It does this by, for example, turning static pages of text or numbers into a database. Monographs are no longer limited by the linear lay-out of the bound volume (or microfilm reader). By transforming text into searchable text, librarians can create whole new resources for their patrons from old ones and even make items that have has little or no use into something that gets a lot of hits.

But how much is enough? A critical mass is enough to allow meaningful queries through curious juxtapositions and comparisons of phenomena, be it the occurrence of the word "chemise" or the census returns from 1900. A large and comprehensive collection is valuable because it provides a context for interpretation. But in the digital realm, it turns out that we really mean something else by this term, critical mass, something ill-defined and quite new. The most salient example of this new phenomenon is the Making of America (MOA) at the University of Michigan, a database of thousands of nineteenth-century imprints at risk of or already embrittled. While the books themselves were seldom called from the stacks, the MOA database is heavily used, though not primarily by students and teachers of Michigan. Current analysis shows that MOA is used most heavily by members of the University of Michigan community. But among its largest users is the Oxford University Press, which mines the database for etymological and lexical research. Is this database heavily used because it is easily searched, and the books were not? Because one can get access to it from any computer in any time zone, while the books were available only to a small number of credentialed users? Were the books as they languished in remote storage not of research value but now, as a database, they are?

In the case of general collections, or imprints, one must select enough that they, take together, create some corpus that is coherent. In one case it may be time periods that set the parameters, in others it might be genre or subject. In many ways what makes a digitized collection offprint materials useful is the ability to search across titles and within subjects. The more items in the collection, the more serendipitous the searching. But as JSTOR has shown, incremental increase in the number of titles in the corpus is possible because they already exist within a context and a search and retrieval protocol.

"Critical mass" could more accurately be thought of as "contextual mass," a (variable) quantity of materials the provide a context for evaluation and interpretation. Whereas in the analog realm, searching within a so-called critical mass has always been very labor-intensive and takes great human effort to reveal the relationship in and among items in that collection, once those items are online in a form that is word-searchable, one has a mass that is now accessible to machine searching, not the more arduous human researching.

But for archival-type collections, which are not necessarily text-based and usually under rougher bibliographical control than published works, the amount of material needed to get a critical mass almost defies the imagination, or at least challenges the budget. If a collections is very large, too large to digitize-a photo morgue or institutional records, say-staff may choose to digitize a portion that represents the strengths of the collection. But what is that? How much is enough? These are subjective decisions, and they are answered differently by different libraries. In the public libraries, with no faculty to provide advice, the decision have been made by the curatorial staff. The Library of Congress has had outside scholarly consultants and educational experts from time to time to aid in selection decisions, but the actual selection decisions are always made by curatorial staff, within limits indicated by scanning and preservation experts. New York Public relies on a curatorial staff that is expert in a number of fields and, as most cultural heritage institutions, has long corporate experience in selecting for exhibitions. Curatorial staff in academic special collections libraries, such as those with rare books, visual resources, and manuscript and archival collections, often have the opportunity to work closely with faculty or visiting fellows who collaborate in shaping a digital collection through selection and even adding descriptive and interpretive text to accompany items.

But many curators see doing anything less that a complete collection as "cherry-picking" that lacks intrinsic value to the research mission of the institution. Others are less severe and cheerfully admit that for most researchers, a little bit is better than nothing at all and very few researchers mine any single collection to the depth that we are talking about. Those who do, they think, would end up coming to see the collection on site at some point in any event. These judgments are generalized from anecdotal experiences, by and large, not on objectively gathered data. When asked, for example, about how research techniques in special collections may be affected by digitization, some librarians asserted that research will be pursued by radically different strategies inside of a decade. Others think that research strategies for special collections materials would not change, even with the technology. The important thing, in their view, is not to get the resources on line, but to make tools for searching what is available in libraries readily accessible on the Web¾tools such as finding aids. New York Public has secured money to do long-term studies of (digitized) special collections users in order to gather information about use and test assumptions about users. More needs to be done and a significant portion of grant-funded digitization, especially that supported by federal and state funds, should include some meaningful form of user analysis in the award.

The California Digital Library (CDL) has inaugurated a project, called California Cultures, designed to make accessible "a 'critical mass' of source materials to support research and teaching. Much of this documentation will reflect the social life, culture, and commerce of ethnic groups in California." [citation] The collection will comprise about 18,000 images. CDL sees collaboration as a key element in sustainability. Because of funding and governance issues, the CDL believes that they must foster a sense of ownership and responsibility for these collections among creators state wide, locality by locality. Their access policies are derived from this view to the extent that they have a built single place where everyone can see what aggregated collections can

The role of scholars in selecting a defined set of contextually meaningful sources often works well in certain disciplines for published items. Agriculture and mathematics are examples where scholars have been able to come up with a list of so-called core literature that is amenable to comprehensive digitization. By way of contrast, curators may do a better job in selecting from special collections of unpublished materials¾musical manuscripts, photo archives, personal papers¾than scholars. These are the sorts of materials that usually only curatorial staff are familiar enough with to make fine assessments. While there are certainly exceptions to this rule, often the sheer quantity of materials from which to select makes the involvement of scholars in all decisions impractical and hence, unscalable.

Scholars as data-creators tend to have a different concept of the term critical mass. Projects such as the Blake Archive and the Digital Scriptorium are built with the achievement of a critical mass for teaching and research in mind. [citation] Whereas a collection-driven text-based program such as MOA can convert massive amounts of text and make it searchable, it can put up materials without an interpretive framework. Other projects such as APIS and, to a large degree, American Memory, invest time and money in creating interpretive frameworks and item-level descriptions that never existed when the items were analog and confined to the reading room, served by knowledgeable staff. In many ways, this type of access is really a new form of publishing, not library service as it is traditionally understood.


The idea of coordinated collection development of digital collections is a powerful one. It motivated several libraries from Berkeley and Michigan to Cornell, to work together to mount several collections from their own holdings that could all be termed part of one, "The Making of America." Given the resources that must be dedicated to creating digital collections and the resources it takes to build the infrastructure that allows access to them, it would seem that the only way to build truly scalable collections is through some cooperative effort.

But for all the talk of building federated collections that will aggregate into a digital library with depth and breadth¾that is, critical mass¾the principle of "states rights" is nonetheless the standard. Each institution decides on its own what to digitize, and usually does so with little or no consultation with other libraries. There are funding sources that require collaboration in some circumstances¾the Library of Congress' Ameritech grant is an example of such¾but the extent of collaboration usually has to do with using the same standards for scanning and, at least sometimes, description. Selection is not truly collaborative; it could more properly characterized as "harmonized thematically." Institutions usually make decisions based on particular institutional needs rather than on consensus community priorities.


Some library staff voice their anxieties that institutional concerns, such as funding raising, public relations, and special projects, divert too many resources from more academically defensible projects or from the core mission of the library, whatever it might be. Most library administrators show an acceptance of this role and some use it to the library's distinct advantage. Even a "vanity project," if managed properly, will bring money into the library for digitization and provide the kind of training and hands-on experience that is necessary to develop digital library infrastructure and expertise. The key to building on such a project is to be sure that all the library's costs, not only scanning but also creating metadata, migrating files, and so forth, are covered. Such projects, done willingly and well, usually enhance the status of the library within the community and seldom do long-term harm. The issue only bodes ill when libraries deliberately seek funding for things that are not core to institutional mission and when existing staff and management overhead are diverted to support low-priority projects. As argued above, outreach can properly be considered part of mission work.

Academic libraries that are funded by a mix of private and public monies, such as Virginia and Cornell, are liable to face special pressures to serve research and teaching but also to devote resources to putting online materials that serve state and regional interests. Thee need not be mutually exclusive, of course, and even Harvard University has acted to demonstrate its good citizenship by contributing items of interests to Cantabridgeans on their publicly accessible Web site.

For public institutions, digital programs offer a unique way to serve communities for taxpayers that they have not been able to heretofore. For example, online distribution of collections is the only way the Library of Congress can provide access to its holdings in all Congressional districts. For the primary funders and governors, Members of Congress, who have built and sustained this library on behalf of their constituents, this rationale is compelling. New York Public also has to answer to a jealous government that, while it does not financially support the library in full, places high demands on the library to fulfill a public mandate. Where academic libraries with dual funding streams¾private and public¾are most vulnerable is when the state expresses some expectation that the university library will mount materials from its collections that are aimed at K-12. There is much (largely unsubstantiated) talk of how access to primary source materials held in research institutions will transform education in the K-12 community. This is a hypothesis that needs to be tested. Nonetheless, for academic institutions, the fact that much grant funding is tied to K-12 interests has resulted in trying to shape research-level materials into a K-12 mold to secure funding, or at least masking a research collection as one that is also suited for younger audiences.

But there is no doubt that public institutions are seen as holding a promise to improve the quality of our civic life if they provide greater access to their richest holdings. In one sense, it is a measure of the high esteem that our society holds libraries in that New York Public and the Library of Congress have been extremely successful in finding public-spirited citizens willing to make extraordinary financial donations in order to "get the treasures out." This level of philanthropy, numbering in the tens of millions of dollars for each library, is simply unthinkable in any other country. While these libraries may be accused of pandering to donors on occasion, or are blamed for not paying enough attention to the academic community by digitizing materials that are not in demand first and foremost by scholars, the fact is that public libraries, like the libraries in state universities, are not designed to serve exclusively, or even primarily, the scholarly community. This does not skew selection for digitization as drastically as some assert. Donors may express an interest in a particular type of material, but they end up choosing from a set of candidate collections that have been proposed by curatorial divisions and vetted by preservation and digital library staff for technical fitness. In terms both of process and result, they differ little from their private academic counterparts.

It is worth noting that in both public and academic libraries, some curators who are active in special collection development advocate for digitization because they see it as a way to induce further donations. For them, the promise of access is a useful collection development tool because digital access advertises what the library collects and demonstrates a commitment to access.


The scarcity of cataloging or description that can be quickly and cheaply converted into metadata is often a decisive factor in excluding a collection from digitization. Given that creating metadata is usually a more expensive activity than the actual scanning, there is the need to take advantage of existing metadata¾better known as cataloging. Often money to digitize has come with a promise by the library director that they will put up several thousand - even million¾images, a daunting pledge. To mount five million images in five years, as the Library of Congress pledged to do, has necessitated giving priority to large collections that already have extensive bibliographical controls. New York Public is likewise giving selection preference to special collections that already have some form of cataloging that can be converted into metadata in order to meet production goals. In this way, expedience can theoretically be happily married to previous institutional investments. These libraries have put enormous resources in past decades to creating descriptions, exhibitions, finding aids, and published catalogs of prized institutional holdings. One can assume that a collection that has been exhibited or made the subject of published illustrated catalog has demonstrated research and cultural value.

Some collections that are supported by endowments can also make the transition to digital access more easily than others, because funds may be available for this within the terms of the gift. The Wallach collection of arts and prints, for example, at the New York Public, has been put online as the Digital Wallach Gallery. There are a number of grant applications that not only build the cost of metadata creation into the digitization project, but also appear to be driven in part by a long-standing desire on the part of a library to get certain special collections finally under bibliographical control.

It can be quite difficult, though, to harmonize the descriptive practices that were prevalent 40 years ago with what is required today. The expansive bibliographical essays that once were standard for describing special collections need quite bit of editing to make them into useful metadata. It is not simply a question of standards, which have always been problematic in special collections anyway. It is the fact that people research and read differently on the Web than when sitting with an illustrated catalog or finding aid at a reading desk. For better or for worse, descriptive practices need to be reconceptualized for presenting these types of materials online. This rethink is several years off, as we have as yet no long-term understanding of how people use special collections online.

For monographs and serials, genres for which the MARC record was originally devised and which is a standard well-understood, retooling catalog records need not be complicated or expensive. For those materials that are published but not primarily text-based, such as photographs, posters, recorded speech or musical interpretations, the MARC record has noted limitations and those tend to be accentuated in the online environment. Unpublished materials share this dichotomy of descriptive practice between textual and non-textual. For those institutions that have chosen to put their special collections online, items that often lack uniform descriptions, tough decisions must be made about how much information can be created in the most cost-effective way. In some cases, re-keying or OCR can be used to produce a searchable text in lieu of creating subject access. But for handwritten documents, non-Roman scripts, and audio and visual resources, searching remains a problem.

The online environment is one in which the context for interpretation needs to be far more explicit than in the analog realm. It is interesting to think about why that is so. Are librarians creating too much descriptive material for online presentation of those collections that have successfully been served in reading rooms with no such level of description, or is it too little? Is it that librarians make assumptions that the level of sophistication or patience in the online user is far lower than that of the onsite researcher? There does seem to be a general operating principle that an online patron will not use a source no matter how valuable, if it is accompanied by minimal-level description. This may be a well-founded principle, and it is certainly true that the deeper and more structured the description, the likelier it is that the item will be found through the various searching protocols most in use. But by removing research collections from the context in which they have traditionally been used-the reading room-one also removes the reference staff who can guide the patron through the maze of retrieval, advise about related sources, and so forth. These are materials that the general public may not have experience or training in using, and yet they are readily available online. For such patrons, these newly available resources can still be inaccessible because of lack of research skills.

The ease of finding digitized items on library Web sites varies to a great degree. There are a few sites that are constructed in a way that makes finding digitized collections almost impossible for people who do not already know they exist. Others have integrated the surrogates into the online catalog and on OCLC and/or RLIN. Some DLF members, those whose primary purpose in digitization is to increase access to special collections and rare items, have expressed willingness to expose the metadata for these collections to a harvester using a technical framework established for the Open Archives Initiative.


In contrast to selection criteria guidelines, there is much less written about how to plan for the access and preservation of digitally reformatted collections over time. This is in part because we know little about maintaining digital assets for the long haul. We have learned a great deal already through the agency of failed or deeply flawed efforts-those of the "we'll never do that again!" variety told of various CD projects, for example-but such lessons tend to be only informally communicated, for understandable reasons. Some exceptions include the University of Michigan, one library that has a clear view of what role digitization plays-that of collection management and preservation-and so has developed and published preservation policies that support those goals. The California Digital Library is also an exception, perhaps because, as a central repository, the need to establish standards and best practices that their contributors must adhere to is paramount to building confidence as well as collections. Harvard University has published information about its plans for a digital repository, and the Library of Congress has also put online much about its planned audiovisual repository in Culpeper. General information about the preservation of digital files can be found on the PADI, CEDARS, and CLIR sites (citation)

Nearly every library declares its intention to preserve the digital surrogates that it creates and, in the case of the Library of Congress, it has also pledged to preserve those surrogates created by other libraries under the auspices of the National Digital Library Program. (Citation: the LC preservation strategy descried in some detail in RLG DigiNews.) In reality, however, many libraries have created digital surrogates for access purposes and may have no strategic interest in maintaining those surrogates with the care that they would if they had created those files to serve as replacements. Libraries nonetheless are uncomfortable at this point coming out and saying that they may have a limited commitment to many of their surrogates, should push come to shove. Or those who are creating surrogates for access purposes alone still declare an interest in maintaining those surrogates as long as they can because the original investment in the creation of digital files has created something of enormous value to their patrons. Moreover, the cost of having to recreate those surrogates and the physical stress it might impose on the source materials argue for maintaining those files as long as possible.

The mechanism for long-term management of digital surrogates is in theory no different from that of born-digital assets. While refreshment and migration of digital collections has occurred in many libraries, the protocols and policies for preservation are clearly still under development. Many libraries have been sensitized to the fact that loss can be simple and catastrophic, beginning with the wrong choice of (proprietary) hardware, software, or medium on which to encode information to negligent management of metadata. The Y2K threat that libraries faced during 1999 has led to systemic improvements in many cases. Not only did institutions become aware of how deleterious is to allow different software to proliferate, but they also developed disaster preparedness plans and often were given funds for infrastructure upgrades that might have been postponed or not funded without the general sense of urgency that the looming crisis provided.

Libraries are anticipating the day when they must develop strategies for handling digital objects created by faculty outside the purview of the library. These are the often elaborate constructions done by individual scholars or groups of collaborators that the library hears about only after the critical choices of hardware, software, and metadata have been made, often by people wholly unaware of the problems of long-term access to digital media.

An increasing number of library managers express concern about the materials created by faculty that are "more than a Web site" yet less, often far less, than what the library would choose to accession and preserve. While libraries acknowledge that this is a growing problem, none have been forced to do much about it yet and thoughts about how to deal with faculty projects is just now evolving. Predictably, those that are collection-driven in approach are working to build a system for selecting what the library wishes to accession to its permanent collections. Cornell is developing criteria that individuals must work to if they expect that the library will provide "perpetual care.'" [citation] CDL already has those guidelines in place. Michigan has well-articulated preservation policy, one that is detailed enough to support their vision of digital reformatting as a reliable long-term solution to the brittle book problem. (See the following policies:....]


In terms of user preferences and support, there is also little understanding of how research library patrons use what has been created for them. Most libraries recognize that the collections they now offer online require different types of support for users than they have traditionally given readers in the reading room. In many cases, user support has been developed for "digital collections" or "digital resources," terms that almost invariably denote born-digital (licensed) materials. The Library of Congress, which specifically targets a K-12 audience, has three reference librarians for their National Digital Library Program Learning Center. Libraries as a rule have not been reallocating staff to deal specifically with digitized collections. Hit rates and analysis of Web transactions have yielded a lot of quantitative data about access to digital surrogates, and that has been mined for any number of internal purposes, from "demonstrating" how popular sites are to making gross generalities about how where users are dialing in from. Qualitative analysis is harder to derive from these raw data, and, as a rule, few in-depth studies have attempted to look into how patrons are reacting to the added functionality and convenience of materials now online. Libraries have been keeping careful track of gate counts, for example, but when they go up or go down, what conclusion are we to draw about the effect of online resources on use of on-site resources?

One of the exceptions to this rule is not a library, but the journal archiving service, JSTOR, which rigorously tracks the use of its resources. It analyzes its users' behavior because it needs to recover costs and hence must stay closely attuned to demand, within the constraints of copyright issues. Looking ahead, such close analysis of the ways in which researchers use specific online resources, and especially how they do or do not contribute to the productivity of faculty and students, will be a prime interest for libraries and further work must begin now to complement this work with analysis of free Web sources mounted by libraries.

Most libraries report having classes and other instructional options available for both students and faculty. Some librarians report that instruction is not really necessary for undergraduates, who are quite used to looking first online, but general orientations to library collections is in greater need than ever.

Good Web site design to create sites easily found, easily navigated, and readily comprehensible is an often overlooked aspect of access. Even within the first-generation libraries, there is an astonishing variety in how well, or poorly, a site is constructed. Even for fairly sophisticated users, finding a library's digital collections can involve going through six or seven screens before reaching descriptions of what is available. Having a professional design team that keeps a site up-to-date and constantly reviews the site for improvements may be yet another expense, but considering that the Web site is the front door to the collections, it would seem penny-wise and pound-foolish to ignore design and marketing.

IV. Conclusions and recommendations

It is clear that digitization, as with other collection development strategies, works to the extent that it supports the mission of an institution. In some libraries, staff feel driven by the need to have some digital projects, whether or not library leaders have made clear what purpose the projects ultimately serve. Certain approaches work only for large libraries and make no sense for small; others work for large public libraries, but not large academic ones. Not all strategies are scalable and any digital projects that exist in splendid isolation from other parts of the home institution or other libraries risk turning into a waste of resources in the end.


A useful way of looking at the issue of the strategic value of digitized collections is to ask what would happen if these programs were self-supporting. Would the money for creating digital surrogates come from the acquisitions budget? From preservation? If the program were supported through a separate line, from which pocket of money would these funds be reallocated? Should fees be charged for access, at least in some cases?

Michigan, which says that it left project-based digitization three years ago, has ongoing budget support for the staffing, equipment, and other infrastructure services such as servers and software resources for digital conversion production. They anticipate that there will be projects in the future that will require additional funds to support special needs outside the core capacities, or to handle larger volume than usual. Harvard's Library Digital Initiative (LDI) is funded from internal funds, though of course the purpose of LDI is not collection development primarily but building infrastructure.

As it is, digitization costs at most libraries are borne by external funds, and the projects developed appeal to the intended source of funding, be it a Federal agency with stringent and inflexible grant conditions; private foundations that have a heuristic interest in projects; or donors and alumni who usually are contributing to the institution for eleemosynary purposes and often do so out of dedication to the institution and its mission per se. When asked about priorities for selection, many respondents remarked wryly that they digitize what they can get money to do, implying and even sometimes stating directly that their choices were skewed by funding consideration and did not serve pure scholarship or other core missions of the institution. However, it is also the case that what selectors, curators, and bibliographers think to be of highest value differed from what administration identified, because they had differing views of where scholarship is tending, how sophisticated the users are, and what is of lasting import.

Some librarians expressed great concern about the fact that, as long as libraries are competing for outside funds to digitize, they will be stuck in the entrepreneurial phase in which collection development will be driven by strong personalities¾this who are willing to compete for funds¾and that some parts of the library's collections will go untapped simply because the subject specialist in that area in not "the entrepreneurial type." Others express a more serious concerns about the fate of non-English language materials, and even greater anxiety about the neglect of non-Roman collections.

Concerns about the changing role of library staff, above all of bibliographers, comes up with increasing frequency. They get increasingly diverted from traditional collection development duties to spend more time selecting for digitization¾what might be called reselecting, something that is bound to have some effect on current collection development of traditional materials. A topic far more widely discussed is where to find the skill sets that are needed for digital library development. If they cannot afford to hire it¾as increasingly they cannot¾how are they to go about growing it from within the organization without robbing Peter to pay Paul? The same is true for preservation staff, who see themselves and their funding diverted from preserving deteriorating collections to creating digital versions of materials that are often not at imminent risk of deterioration.

A number of special collections librarians have noted that, as items from the collections are digitized and given visibility, more people have become interested in the items and in related, not-yet-digitized materials. The number of onsite users and phone, letter, and e-mail inquiries have risen. This in turn places increased physical stress on original materials as they are used more frequently. This then increases the work load on staff, especially on preservation and reference staff.

While reliable and meaningful cost data about digitization are rare and not often useful in comparative contexts, costing out the elements of digitizing would mean beginning with selection and going to physical preparation, cataloging, physical capture, creation of metadata, mounting and managing files, designing and maintaining the site, providing additional user services, and going through to implementing a long-term preservation strategy. Virtually every step involves human intervention and skill, and these costs, unlike those of storage, for example, are unlikely to go down.


In an exercise not yet seen in the United States, Oxford University libraries recently looked at their experiences with digital conversion in an attempt to identify what benefits it had brought to the library and its patrons. [citation] Curiously, one of the chief benefits cited was to curators, who learned an enormous amount about their own collections and about those of other colleges. This seems an expensive way to break down barriers between departments and other library units, or to make curators more familiar with their collections. But the report is alluding to one aspect of how the very role of bibliographer and curator is changing s a result of putting collections online. This is similar to reports on various campuses across the United States that digital projects are crucial for staff training and to develop expertise, though the matter of collection expertise is not often mentioned, rather the benefit is about the other aspect of so-called e-curatorship, in which staff develop technical and editorial or interpretive expertise. (This does not always benefit the library or university in the end, because expert staff are difficult to retain and many managers complained that expenses incurred in training staff served to benefit the staff's next employer.)

Benefits to users are also commonly cited, though as noted, systematically gathered evidence of user satisfaction is hard to come by. Managers also cite benefits to the collections through creation of surrogates that protect original materials while increasing access to the content.


What is clear by now is that digitization must be an integral part of the core mission work of the library to be a sustainable activity over time. Whereas the majority of research libraries engaged in digitization have been able to raise external funds for conversion, they all recognize that hazards of relying on these funds for long. There is no such thing as a free building. Even were a donor to pay for all aspects of the construction, from land acquisition to furnishing, at some point the ongoing maintenance costs will become the responsibility of the home institution and the building must meet minimum criteria for support.

The same holds true of digitized collections. Over the next few years we will see some libraries that have done digital projects essentially phase them out or reduce this activity to the exception rather than the rule. Others committed to large-scale digital projects either as a part of collection management or as a commitment to extending access, will continue and begin to address the tough questions of finding internal funds or developing fee-based services to support conversion, maintenance, and service.


Be clear about the purpose of the project

Begin with a clear statement about whether or not the library will maintain the surrogates and for how long

Either focus on materials that are already organized, or target materials not under intellectual control and include funding for this in the project budget

Work with faculty and scholars to develop controlled vocabularies in those fields lacking them, to aid in crating metadata

Clarify the audience for a collection: do not attempt to be all things to all audiences

Secure funding for and conduct user assessments of digitized collections and make the information from these assessments available to the library community

Develop a policy for the disposition of reformatted materials

V. SOURCES [incomplete]

NYPL, Planning Digital Projects for Historical Collections, with its own bibliography

RLG DigiNews LC plan:

Oxford Scoping Project: accessed on 4 January 2001.

Dale Flecker, "Harvard's Digital Library Initiative: Building a First Generation Digital Library Infrastructure, at: Accessed on 4 January 2001.

OAC: then projects then OAC]

Michigan digital conversion selection policy

Digital Scriptorium

APIS Advanced Papyrological Information System

Emory Women Writers

Hoagy Carmichael Web site

de Stefano 2000, "Selection for Digital Conversion in Academic Libraries," 1 Literature on copyright abounds; among the most useful in program planning is by Melissa Levine in NEDCC's [citation] 27