Almost all aspects of repository operation are conditioned by the format of the objects in the repository. The Harvard University Library Digital Repository Service (DRS) has been in production operation for four years and has over 1.5 million digital objects (7 TB) under managed storage. A recent comparison of internal technical metadata extracted from these objects with the external metadata supplied in the objects' Submission Information Packages (SIPs) revealed some troubling inconsistencies. Additionally, a small percentage of objects were found to be invalid or malformed with respect to their formats. A post- mortem investigation determined the cause of these problems to include both human and system failures, in some instances on a systemic basis with regard to format. We report on the findings of this effort and discuss systems that are now in place and under development for automated SIP construction and pre-ingest validation intended to mitigate such problems in the future. We also present an update on JHOVE, the JSTOR/Harvard Object Validation Environment, useful for format-specific object identification, validation, and characterization.
