|We feel that any
viable digital stewardship organization that accepts content from such varied
sources and that does not conform to de facto preservation “requirements”
must find a way to minimize unpredictability in its workflows to be
efficient, especially if the repository intends to offer levels of service
beyond bitstream preservation.
|We decided that the
best way to minimize unpredictability was to identify and gauge the risks
associated with preserving digital content prior to its ingestion. Clear that
we needed some framework in which to make agreements between the repository
and its clients (the content owners) about preservation commitments. Such a
framework also serve to inform and guide the development of repository
services, e.g., metadata encoding, pre-ingestion transformation, long-term
format migration and delivery, as well as simple bit preservation.
|A few years a small
team of folks at Stanford set about researching file formats and assessment
of arbitrarily-created digital objects, conducted a close evaluation of
technical metadata for image and text formats.
|The research project
resulted in a questionnaire designed to serve as a data collection tool, and
to be used in a conversation between repository staff and content depositor.
|In its seminal form,
the questionnaire was limited in its utility. First of all, its scope was
limited to text and image formats. And ultimately it served mostly to gauge
and curb human expectations about long-term prospects for files that are
target of preservation services. It required manual analysis on a
file-by-file basis, so data collection was time-consuming, awkward, and the
results were not yet machine actionable. While it felt like the right start,
clearly it was inefficient and incomplete.