Summary report of a meeting held on February 4-5, 2002 to consider requirements for am archivists' toolkit

Brad Westbrook
March 14, 2002

Background

Sponsored by the Digital Library Federation (DLF) and the California Digital Library (CDL), twenty-one archivists and information technologists met in La Jolla, Calif., on February 4-5, 2002. The purpose of the meeting, known as the Archivists' Workbench meeting, was to discuss the concept of a workbench or suite of digital tools that would facilitate collection and management of information about archival materials at the various points along the life cycle of those collections. Ideally, a workbench would facilitate integration of the disparate filing systems and databases now used in most archival repositories for collecting and managing their archival information, and it would enable more efficient production of various outputs, ranging from encoded finding aids for use by end users to internal administrative reports.

Chief among the meetings successfully met objectives was validation of a broad need for a digital toolkit that would:

Participants in the February meeting also discussed strengths and weaknesses of a variety of technological solutions that might serve as a possible platform on which to build this suite of tools. In addition, participants considered incomplete or unsuccessful efforts by the archival community during the late 1980s and early 1990s to construct a comprehensive data management utility, as well as the lessons derived from those efforts.

In light of lessons learned from previous unsuccessful attempts to build archival information systems, meeting participants concluded it was extremely important to focus narrowly initial design of an archivists' workbench. Earlier attempts at the creation of such tools had failed in part because they aimed for comprehensiveness of process and participation at the outset. Participants in the Archivists' Workbench meeting decided it would be best focus initial construction and application of the toolkit to a homogeneous group of repositories, smallish archives and special collection units in which one professional is typically responsible for most, if not all, of the archival work. This group was targeted because meeting participants believe such repositories are lacking in staffing resources to standardize their archival processes and contribute their descriptions and surrogates to consortial databases and because publication of the archival materials administered by these repositories would greatly benefit the research community. In addition, such repositories represent a middle ground between the "lone processor" historical society and the multi-staffed manuscripts and archives unit that exist at a few of the nation's research libraries. Workflows would be easier to discern in those environments, and it would be easier to build upon those results, presuming their success, to enlarge subsequent designs to include a broader range of repositories and more complicated workflows.

Participants in the February meeting also cautioned that this current effort to construct a suite of digital tools for archivists not become paralyzed at the outset due to too grand a vision. They advised that a few key archival functions be targeted. That advice has been considered thoroughly in the aftermath of the February meeting and during the composition of this grant request. The planning process, for which funding is being requested with this proposal, will be devoted in large part to identifying those archival functions that are typical and related and, hence, could and should be accommodated in a toolkit. The objective is not be comprehensive in the initial design but, rather, to make sure we allow collection of related data when it can be collected relatively easily and enable thorough use of all data collected. Another objective is to build the toolkit with an eye toward facilitating future modifications and extensions. In short, an accommodating design, and not a comprehensive design, is the target of the planning sessions. The particulars of that design will be the product of the planning sessions.

The meeting concluded with the commitment of twelve participants, known as the Archivist Workbench Core Team, to begin defining the functional requirements and system attributes of a workbench by elaborating and specifying the high-level requirements agreed to during the meeting and to join together in a planning process, the objective of which is to define a paper prototype of the archivists' workbench and secure funding for building and testing a working prototype.

DESIGN CONSIDERATIONS FOR AN ARCHIVISTS' WORKBENCH

First among the high level requirements validated at the meeting is that the tool set needs to be informed by the life cycle of an archival collection or item as it progresses through a repository, from first contacts with a creator or donor of the archival materials through completion of the arrangement and description to use of the resource by the research community. However, while it is true that all collections or documents reflect the same basic life cycle, how that life cycle is articulated in one repository may differ in some ways from its articulation in another repository. Work may be sequenced one way in one repository and another way in another repository. One repository may cluster its data differently than does another repository. And one repository may choose not to collection information than another repository believes indispensable. Differing life cycle articulations can be due to different staffing levels and

Second, every archival function typically has two basic aspects. One aspect is the physical labor required to perform the function, such as transferring a set of boxes to the custody of the repository. The other aspect is the documentation or representation of the task and its results. Archival representation is the sum of the recording of the archival work of acquiring, processing, and servicing of archival materials. Historically, data generated from these events has been stored in a variety of locations, some digital (e.g., spreadsheets, databases, word processor files) and some analog (e.g., paper collection files, rolodexes, printed finding aids). As a consequence, the richness of this information and its myriad relationships has rarely been utilized to its fullest potential by archivists and curators.

Third, as demonstrated during the February meeting, there are significant differences across repositories regarding the sequence or workflow of the archival functions generating the representations, not to mention differences in how repositories represent each function (i.e., character and number of data elements). Meeting attendees agreed that an archivists' workbench would need to be flexible and adaptable to different work environments and able to accommodate different workflows. With minimal customizing, the suite of tools should be deployable on a single desktop in a one-person repository, or on a network serving a larger repository or even a consortium of repositories such as the Five Colleges or participants in CDL's Online Archive of California.

Meeting participants also agreed it was important for the toolkit to accommodate processes and workflows as established by individual repositories, since variance in institutional missions, staffing patterns, funding, and space are important determinants for how a repository represents and sequences its archival work. Accommodating a range of representational practices and workflows is complicated by the probability that not all archival repositories define their archival functions with the same delimiters. This state of affairs necessitates building flexibility into the toolkit that permits implementers to tailor it to their own needs but without compromising archival standards for content and structure1 that are imperative for developing broadly useful consortial access systems to archival resources. Obviously, it is inevitable that successful design and implementation of an archivists' workbench will require repositories to analyze their local practice and evaluate whether or not changes to those practice would be beneficial; however, the toolkit will enjoy even greater success if it can accommodate a wide range of those local practices and minimize the need for conformity to the toolkit.

The strong consensus reached in the February Archivists' Workbench meeting was that a modular design would best accommodate different work environments and workflows; hence, a blueprint for a suite of tools or toolkit would be the desired outcome of the planning phase of this project.

Modules determined by archival functions or predictable archival representation events allow for sequencing the modules in a manner that best conforms to the actual workflow employed in a given repository. In simple terms, a modular toolkit would consist of input templates and associated program code, storage data tables, and output formats and associated program code. The configuration of input screens would be determined by repository workflows, and they would funnel data to the storage data tables. These storage tables would not necessarily reflect boundaries or relationships suggested by the input templates. When the same data is required in the representation of different archival functions, it would be collected at the first available opportunity in the workflow, stored in a single location in the storage tables and reused for representation of subsequent functions. Data would be entered and stored according to community content standards. For example, controlled access terms would be entered and stored in accord with the principles of the LC Name Authority File, the LC Subject Heading list and other established thesauri. Data structure and transmission standards would be applied on export of information in one of the defined output routines. Output products would minimally include encoded and printed finding aids, standardized digital objects (MOA2 or METS), and cross collection browse lists created by archivists in response to end user queries, but they could also include provisional MARC and DC cataloging records for the collection and selected sub-parts and a wide and diverse set of administrative reports such as shelf lists, or periodic quantitative statements on major functions such as acquisition, digitization, or cataloging.

Effective delimitation of the modules, accompanied by sufficient documentation, should make the suite of tools capable of being implemented differently by different repositories, or of being modified by a single repository through time to reflect changes in the workflow pattern due to changing staff levels or repository goals. In addition, if modules are defined at high enough levels of granularity, it will be possible for modules to be combined in such a way that best reflects how archival functions are defined and represented in a specific repository. Finally, this design approach will enable repositories to use only those modules pertinent to their current workflow. Assuming, for example, that the toolkit includes a digital object production module, a repository not creating digital objects could elect not to use it at all or use it at a later date when the repository begins to create and upload digital objects.

Participants in the February Archivists Workbench meeting clearly confirmed that the most pressing need at present is a tool to facilitate the output of encoded finding aids to enable online access to archival resources through repository websites and union databases. Nonetheless, participants also agreed that while efficient production of finding aids and other access products should be the primary rationale for building a toolkit, it should not be sole objective for an archivists' workbench. Consideration should also be given to how the archival information might be re-used for other purposes already extant in archival repositories and how it could be adapted to future needs. The toolkit we envision incorporates finding aid production but looks well beyond it to include a greater range of functionality that could result in significant efficiencies for archival workers across the range of archival work and not just for finding aid encoding. For example, we envision a toolkit that, with some adaptation, could facilitate ingestion of electronic records and their associated metadata, as well as other kinds of born digital materials.

A service and maintenance model is the final critical feature for an archivists' toolkit. Meeting participants concurred it would be folly to invest considerable resources in constructing a suite of digital tools and not address how the toolkit will be maintained and modified over time to keep current with technological developments and changes in archival work. A good service model would satisfy several basic requirements:

1 Anglo American Cataloging Rules, 2nd Ed. and Archives, Personal Papers, and Manuscripts are the two notable content standards for formulating archival descriptive information and name entries. The Encoded Archival Description DTD is the most noteworthy structure standard for archival material, but the toolkit can be made to embrace other structure standards such as MARC and Dublin Core.