Notes
Slide Show
Outline
1
 
2
"Components of process"
  • Components of process
  • Results of process as recorded in METS using mdRef
  • “Collection level” output created, (but not included in METS)
3
 
4
 
5
"Format score matrix"
  • Format score matrix
    • Uses Fleischauer / Arms criteria for sustainability
    • Quality Policy Matrix
    • Stanford preferred formats


  • fileFormat.xsd (file format identification)
  • sdrFormatStatusDefaults.xml (default values for quality policies & preferred formats)
  • fileFormatStatus.xsd (scoring factors, format score value, preservation quality, policy status
6
 
7
"2"
  • 2.  Preservation assessment categories of known risk factors
  • preservationAssessmentFlags.xsd (structures used to record “red flags” that are to be searched from Jhove output)
  • preservationAssessmentFlags.xml (data about red flags used for input)
8
 
9
"3"
  • 3.  Stanford digital repository digital provenance output
  • sdrDigiprov.xsd (structures used to document digiprov information)
    • Output:
      • sdrDigiprov output for one file
      • sdrDigiprovSummary (summary for a “collection”)
10
"Results of process recorded of..."
  • Results of process recorded of preservation analysis in METS per file by:
    • Sequence of amdSec elements for all files in a collection containing:
      • JHove output
      • SDR Preservation Assessment output



11
"<amdSec ID="AMD_2.01.002.2.01..."
  • <amdSec ID="AMD_2.01.002.2.01">
  • <techMD ID="TECH_2.01.002.2.01" CREATED="2004-12-27T07:10:12">
  • <mdRef ID="JHOVE_2.01.002.2.01" LOCTYPE="URL" xlink:type="simple" xlink:href="./METADATA/AIHT- CONTENT/CONTRIBUTORS/1199_photos/wtc_web/jhov e_2.01.002.2.01.xml" MDTYPE="OTHER" OTHERMDTYPE="jhove" LABEL="./AIHT- CONTENT/CONTRIBUTORS/1199_photos/wtc_web/WTC1. jpg"/>
  • </techMD>


12
Example SDR Preservation Assessment output per file in METS
13
"EVENT instance"
  • EVENT instance
14
 
15
"<dig:agent>#"
  • <dig:agent>#1
  • <dig:agentIdentifier
  • <dig:agentIdentifierValue="Empirical_Walker_v1"
  • <dig:agentIdentifierScheme="SDR_SWAgent_v1"/>
16
"<dig:object"

  •  <dig:object
  •  <dig:objectIdentifier objectIdentifierValue=".\test files\1015.pjpeg"
  • <dig:objectIdentifierScheme="Collection_Relative_Path"/>
  • <dig:objectCharacteristics>
  •  <dig:compositionLevel>0</dig:compositionLevel>
  • <dig:fixitycheckValue="ae45d7c040936f16d1eb0b47764fa0ba”       checkMethod="MD5"/>
  • <dig:size>31137</dig:size>
  • </dig:objectCharacteristics>
17
"</dig:format"
  • </dig:format>
  • <dig:formatStatus>
  • <fil:scoringFactors adoption="true" disclosure="true" transparency="true" selfDocumentation="false" externalDependencies="true"
  • <fil:formatScore scoringValue="1"
  • <fil:preservationQuality qualityValue="StatusQuality_High"
  • <fil:policyStatus policyValue="StatusPolicy_Approved
  • </dig:format>
18
"<dig:preservationRisk"
  • <dig:preservationRisk>
  • <dig:flagRaised flagType="JPEG_ProgressiveEncoding">
  •  <dig:test feature="property" path="JPEGMetadata:Images:Image:Scans" compare="any value" datatype="integer"/>
  •  <dig:valuesFound>   <value>1</value>  </dig:valuesFound>
  •  <dig:recommendation action="RiskAssessment_TransformFormat">
  •  <transformation newFormatName="JPEG2000" transformationEffects="Preserves image data in a &quot;standard&quot; format expected to gain in preservation quality, and policy status, as adoption increases; benefits of progressive encoding may not be retained."/>
  • <transformation newFormatName="JPEG" transformationEffects="Image data retained in a format closer to de facto baseline; greater software compatibiilty; benefits of progressive encoding will not be retained in the conversion."/>
  • </dig:preservationRisk>
19
"Other output information created with..."
  • Other output information created with potential for recording:
    • SDR Preservation analysis for “collection”
20
"Formats by format name"
  • Formats by format name
  • Format verification information (verified, by extension, mismatch, unknown)
  • Format score by preservation quality and status
  • Preservation risk by “red flags” raised per format type with reasons
  • File names of unknown formats & “red flags” raised   (sdrSummaryCollections.txt)
21
"Rationale for using mdRef"
  • Rationale for using mdRef:
    • A separate metadata file hierarchy mirrored file hierarchy of the AIHT collection;  kept down the size of the METS document (which were HUGE)
    • Provided more granularity in cases where edits or replacement of the metadata for a single file was needed
    • Allowed more efficient access mechanisms provided by the filesystem instead of the slower access methods of current generation XML tools

  • Disadvantages to export library:
    • Required much more resource to process
    • More difficult to manage the correlation of metadata per file with the inventory and logical and physical structures within the METS document

22
 
23
Back to Hannah…