Almost all aspects of repository
operation are conditioned by the format of the objects in the repository. The
Harvard University Library Digital Repository Service (DRS) has been in
production operation for four years and has over 1.5 million digital objects
(7 TB) under managed storage.
A recent comparison of internal technical
metadata extracted from these objects with the external metadata supplied in
the objects' Submission Information Packages (SIPs) revealed some troubling
inconsistencies. Additionally, a small percentage of objects were found to be
invalid or malformed with respect to their formats. A post- mortem
investigation determined the cause of these problems to include both human and
system failures, in some instances on a systemic basis with regard to format.
We report on the findings of this effort and discuss systems that are now
in place and under development for automated SIP construction and pre-ingest validation
intended to mitigate such problems in the future. We also present an update on
JHOVE, the JSTOR/Harvard Object Validation Environment, useful for
format-specific object identification, validation, and characterization.
Data Dictionary – Technical Metadata for Digital Still Images
Administrative metadata for audio objects – Audio object schema