Stanford Digital Repository
Preservation Assessment
“No Surprises, Please.”
• Conducted research into file formats
• Developed a questionnaire
• Identify and gauge risks in long-term preservation
• Establish a framework for decision-making
• preservation commitments
• repository planning
• Minimize unpredictability
•
•
•
We feel that any viable digital stewardship organization that accepts content from such varied sources and that does not conform to de facto preservation “requirements” must find a way to minimize unpredictability in its workflows to be efficient, especially if the repository intends to offer levels of service beyond bitstream preservation.

We decided that the best way to minimize unpredictability was to identify and gauge the risks associated with preserving digital content prior to its ingestion. Clear that we needed some framework in which to make agreements between the repository and its clients (the content owners) about preservation commitments. Such a framework also serve to inform and guide the development of repository services, e.g., metadata encoding, pre-ingestion transformation, long-term format migration and delivery, as well as simple bit preservation.

A few years a small team of folks at Stanford set about researching file formats and assessment of arbitrarily-created digital objects, conducted a close evaluation of technical metadata for image and text formats.

The research project resulted in a questionnaire designed to serve as a data collection tool, and to be used in a conversation between repository staff and content depositor.

In its seminal form, the questionnaire was limited in its utility. First of all, its scope was limited to text and image formats. And ultimately it served mostly to gauge and curb human expectations about long-term prospects for files that are target of preservation services. It required manual analysis on a file-by-file basis, so data collection was time-consuming, awkward, and the results were not yet machine actionable. While it felt like the right start, clearly it was inefficient and incomplete.