Summary report of a meeting held on February 4-5, 2002 to
consider requirements for am archivists' toolkit
Brad Westbrook
March 14, 2002
Background
Sponsored by the Digital Library Federation (DLF) and the
California Digital Library (CDL), twenty-one archivists and
information technologists met in La Jolla, Calif., on February
4-5, 2002. The purpose of the meeting, known as the Archivists'
Workbench meeting, was to discuss the concept of a workbench or
suite of digital tools that would facilitate collection and
management of information about archival materials at the various
points along the life cycle of those collections. Ideally, a
workbench would facilitate integration of the disparate filing
systems and databases now used in most archival repositories for
collecting and managing their archival information, and it would
enable more efficient production of various outputs, ranging from
encoded finding aids for use by end users to internal
administrative reports.
Chief among the meetings successfully met objectives was
validation of a broad need for a digital toolkit that would:
- Create efficiencies in data capture and reuse at various
points in repository workflows;
- Reduce barriers to participation in consortial and
institutional access systems by making digital encoding for
online access a system byproduct rather than a complex additional
segment of staff work;
- Reduce educational requirements and training tasks by
automating complex encoding procedures and other kinds of work
routines;
- Increase application of data content and structure standards,
assuring greater interoperability of end-user access products
such as encoded finding aids and digital objects; and
- Integrate in one system, serving one or more archival
repositories, archival data typically dispersed across several
databases and filing systems, digital and analog.
Participants in the February meeting also discussed strengths
and weaknesses of a variety of technological solutions that might
serve as a possible platform on which to build this suite of
tools. In addition, participants considered incomplete or
unsuccessful efforts by the archival community during the late
1980s and early 1990s to construct a comprehensive data
management utility, as well as the lessons derived from those
efforts.
In light of lessons learned from previous unsuccessful
attempts to build archival information systems, meeting
participants concluded it was extremely important to focus
narrowly initial design of an archivists' workbench. Earlier
attempts at the creation of such tools had failed in part because
they aimed for comprehensiveness of process and participation at
the outset. Participants in the Archivists' Workbench meeting
decided it would be best focus initial construction and
application of the toolkit to a homogeneous group of
repositories, smallish archives and special collection units in
which one professional is typically responsible for most, if not
all, of the archival work. This group was targeted because
meeting participants believe such repositories are lacking in
staffing resources to standardize their archival processes and
contribute their descriptions and surrogates to consortial
databases and because publication of the archival materials
administered by these repositories would greatly benefit the
research community. In addition, such repositories represent a
middle ground between the "lone processor" historical society and
the multi-staffed manuscripts and archives unit that exist at a
few of the nation's research libraries. Workflows would be easier
to discern in those environments, and it would be easier to build
upon those results, presuming their success, to enlarge
subsequent designs to include a broader range of repositories and
more complicated workflows.
Participants in the February meeting also cautioned that this
current effort to construct a suite of digital tools for
archivists not become paralyzed at the outset due to too grand a
vision. They advised that a few key archival functions be
targeted. That advice has been considered thoroughly in the
aftermath of the February meeting and during the composition of
this grant request. The planning process, for which funding is
being requested with this proposal, will be devoted in large part
to identifying those archival functions that are typical and
related and, hence, could and should be accommodated in a
toolkit. The objective is not be comprehensive in the initial
design but, rather, to make sure we allow collection of related
data when it can be collected relatively easily and enable
thorough use of all data collected. Another objective is to build
the toolkit with an eye toward facilitating future modifications
and extensions. In short, an accommodating design, and not a
comprehensive design, is the target of the planning sessions. The
particulars of that design will be the product of the planning
sessions.
The meeting concluded with the commitment of twelve
participants, known as the Archivist Workbench Core Team, to
begin defining the functional requirements and system attributes
of a workbench by elaborating and specifying the high-level
requirements agreed to during the meeting and to join together in
a planning process, the objective of which is to define a paper
prototype of the archivists' workbench and secure funding for
building and testing a working prototype.
DESIGN CONSIDERATIONS FOR AN ARCHIVISTS' WORKBENCH
First among the high level requirements validated at the
meeting is that the tool set needs to be informed by the life
cycle of an archival collection or item as it progresses through
a repository, from first contacts with a creator or donor of the
archival materials through completion of the arrangement and
description to use of the resource by the research community.
However, while it is true that all collections or documents
reflect the same basic life cycle, how that life cycle is
articulated in one repository may differ in some ways from its
articulation in another repository. Work may be sequenced one way
in one repository and another way in another repository. One
repository may cluster its data differently than does another
repository. And one repository may choose not to collection
information than another repository believes indispensable.
Differing life cycle articulations can be due to different
staffing levels and
Second, every archival function typically has two basic
aspects. One aspect is the physical labor required to perform the
function, such as transferring a set of boxes to the custody of
the repository. The other aspect is the documentation or
representation of the task and its results. Archival
representation is the sum of the recording of the archival work
of acquiring, processing, and servicing of archival materials.
Historically, data generated from these events has been stored in
a variety of locations, some digital (e.g., spreadsheets,
databases, word processor files) and some analog (e.g., paper
collection files, rolodexes, printed finding aids). As a
consequence, the richness of this information and its myriad
relationships has rarely been utilized to its fullest potential
by archivists and curators.
Third, as demonstrated during the February meeting, there are
significant differences across repositories regarding the
sequence or workflow of the archival functions generating the
representations, not to mention differences in how repositories
represent each function (i.e., character and number of data
elements). Meeting attendees agreed that an archivists' workbench
would need to be flexible and adaptable to different work
environments and able to accommodate different workflows. With
minimal customizing, the suite of tools should be deployable on a
single desktop in a one-person repository, or on a network
serving a larger repository or even a consortium of repositories
such as the Five Colleges or participants in CDL's Online Archive
of California.
Meeting participants also agreed it was important for the
toolkit to accommodate processes and workflows as established by
individual repositories, since variance in institutional
missions, staffing patterns, funding, and space are important
determinants for how a repository represents and sequences its
archival work. Accommodating a range of representational
practices and workflows is complicated by the probability that
not all archival repositories define their archival functions
with the same delimiters. This state of affairs necessitates
building flexibility into the toolkit that permits implementers
to tailor it to their own needs but without compromising archival
standards for content and structure1 that are imperative for
developing broadly useful consortial access systems to archival
resources. Obviously, it is inevitable that successful design and
implementation of an archivists' workbench will require
repositories to analyze their local practice and evaluate whether
or not changes to those practice would be beneficial; however,
the toolkit will enjoy even greater success if it can accommodate
a wide range of those local practices and minimize the need for
conformity to the toolkit.
The strong consensus reached in the February Archivists'
Workbench meeting was that a modular design would best
accommodate different work environments and workflows; hence, a
blueprint for a suite of tools or toolkit would be the desired
outcome of the planning phase of this project.
Modules determined by archival functions or predictable
archival representation events allow for sequencing the modules
in a manner that best conforms to the actual workflow employed in
a given repository. In simple terms, a modular toolkit would
consist of input templates and associated program code, storage
data tables, and output formats and associated program code. The
configuration of input screens would be determined by repository
workflows, and they would funnel data to the storage data tables.
These storage tables would not necessarily reflect boundaries or
relationships suggested by the input templates. When the same
data is required in the representation of different archival
functions, it would be collected at the first available
opportunity in the workflow, stored in a single location in the
storage tables and reused for representation of subsequent
functions. Data would be entered and stored according to
community content standards. For example, controlled access terms
would be entered and stored in accord with the principles of the
LC Name Authority File, the LC Subject Heading list and other
established thesauri. Data structure and transmission standards
would be applied on export of information in one of the defined
output routines. Output products would minimally include encoded
and printed finding aids, standardized digital objects (MOA2 or
METS), and cross collection browse lists created by archivists in
response to end user queries, but they could also include
provisional MARC and DC cataloging records for the collection and
selected sub-parts and a wide and diverse set of administrative
reports such as shelf lists, or periodic quantitative statements
on major functions such as acquisition, digitization, or
cataloging.
Effective delimitation of the modules, accompanied by
sufficient documentation, should make the suite of tools capable
of being implemented differently by different repositories, or of
being modified by a single repository through time to reflect
changes in the workflow pattern due to changing staff levels or
repository goals. In addition, if modules are defined at high
enough levels of granularity, it will be possible for modules to
be combined in such a way that best reflects how archival
functions are defined and represented in a specific repository.
Finally, this design approach will enable repositories to use
only those modules pertinent to their current workflow. Assuming,
for example, that the toolkit includes a digital object
production module, a repository not creating digital objects
could elect not to use it at all or use it at a later date when
the repository begins to create and upload digital objects.
Participants in the February Archivists Workbench meeting
clearly confirmed that the most pressing need at present is a
tool to facilitate the output of encoded finding aids to enable
online access to archival resources through repository websites
and union databases. Nonetheless, participants also agreed that
while efficient production of finding aids and other access
products should be the primary rationale for building a toolkit,
it should not be sole objective for an archivists' workbench.
Consideration should also be given to how the archival
information might be re-used for other purposes already extant in
archival repositories and how it could be adapted to future
needs. The toolkit we envision incorporates finding aid
production but looks well beyond it to include a greater range of
functionality that could result in significant efficiencies for
archival workers across the range of archival work and not just
for finding aid encoding. For example, we envision a toolkit
that, with some adaptation, could facilitate ingestion of
electronic records and their associated metadata, as well as
other kinds of born digital materials.
A service and maintenance model is the final critical feature
for an archivists' toolkit. Meeting participants concurred it
would be folly to invest considerable resources in constructing a
suite of digital tools and not address how the toolkit will be
maintained and modified over time to keep current with
technological developments and changes in archival work. A good
service model would satisfy several basic requirements:
- Provide training for repositories in the use of the
toolkit;
- Provide ample documentation of all component parts of the
toolkit;
- Provide assistance to toolkit users with implementing and
customizing the input templates and output formats;
- Provide structure and procedure for updating the toolkit in a
timely and appropriate manner to keep pace with technological
evolutions; and
- Provide a mechanism for tracking all registered users so they
can be easily notified of new modifications and features.
return to top >> |