Notes
Slide Show
Outline
1
Implementing the PREMIS Data Dictionary
  • Priscilla Caplan, OCLC
  • Rebecca Guenther, Library of Congress
  • Markus Enders, SUB Göttingen
  • Nancy Hoebelheinrich, Stanford Univ.


  • Digital Library Federation Spring Forum
  • April 10, 2006
2
Outline
  • Introduction to PREMIS data dictionary
  • Implementing the data dictionary in a repository
    • Issues
    • Implementing in DAITSS
    • Implementing in LMER
  • Implementing using XML in METS
    • Issues
    • Stanford Digital Repository
    • MathArc



3
PREMIS Data Dictionary
  • May 2005: Data Dictionary for Preservation
  • Metadata: Final Report of the PREMIS Working Group


  • 237-page report includes:
    • PREMIS Data Dictionary 1.0
    • Accompanying report (context, data model, assumptions)
    • Special topics, glossary, usage examples

  • Data Dictionary: comprehensive, practical resource for implementing preservation metadata in digital archiving systems
    • Comprehensive view of information requirements needed to support digital preservation
    • Based on deep pool of institutional experiences in setting up and managing operational capacity for digital preservation
    • Builds on previous work


  • XML schemas: Set of 5 XML schemas to support implementation


4
PREMIS Maintenance Activity
5
Some guiding principles and assumptions …
  • “Preservation metadata”: maintain viability, renderability, understandability, authenticity, identity in a preservation context
  • “Core”: What most preservation repositories need to know to preserve digital materials over the long-term
  • “Implementable”: rigorously defined; supported by usage  guidelines/recommendations; emphasis on automated workflows
6
PREMIS data model
7
Semantic units pertaining to objects
  • objectIdentifier
  • preservationLevel
  • objectCategory
  • objectCharacteristics
  • creatingApplication
  • originalName
  • storage
  • environment




  • signatureInformation
  • relationship
  • linkingEventIdentifier
  • linkingIntellectual Entity Identifier
  • linkingPermission StatementIdentifier





8
objectCharacteristics

  • compositionlevel
  • fixity
  • size
  • format
  • significantProperties
  • inhibitors
9
Semantic units pertaining to Events

  • eventIdentifier
  • eventType
  • eventDateTime
  • eventDetail
  • eventOutcome
  • eventOutcomeDetail
  • linkingAgentIdentifier
  • linkingObjectIdentifier
10
Semantic units pertaining to Agents


  • agentIdentifier
  • agentName
  • agentType
11
Semantic units pertaining to Rights

    • permissionStatement
      • permissionStatementIdentifier
      • relatedObject
      • grantingAgent
      • grantingAgreement
      • permissionGranted
        • act
        • restriction
        • termOfGrant
        • permissionNote

12
Sample Data Dictionary entry
13
Data dictionary as implementation neutral

  • No assumptions on specific implementation
  • Promote flexibility/interoperability
  • Focus on semantic units: what you need to know (implementation-neutral) vs. metadata elements: how you record it (implementation-specific)
  • Information that needs to be “recoverable” from the digital archiving system, independent of local implementation
  • Options for SIP or DIP:
    • Use PREMIS XML schemas as is
    • Incorporate pieces of schemas into METS
    • Incorporate into another framework (e.g. DIDL)
14


Implementing the data dictionary in a repository
15
How PREMIS can be used
  • For systems in development
    • as a basis for metadata definition
  • For existing repositories
    • as a checklist for evaluation


  •     “It seems that often people say they aren't ready to implement PREMIS yet, but they don't seem to realise they are already collecting some of the same information that PREMIS describes. The metadata is the same because it is often common sense that it is needed in a repository system. PREMIS can be useful to point out a few extra areas they perhaps hadn't thought of yet.”


16
Common implementation issues
  • Reconciling repository data model with PREMIS data model


  • What must be explicitly recorded locally, and what can be implicit?


  • Implementation in relational databases


  • How to create or obtain metadata values?
    • Role of registries
    • Need for tools

  • What values to use for controlled vocabularies?


  • Need to supplement with non “core” metadata


  • How to represent metadata in a standard way in XML for SIPs and DIPs
17
Implementing in DAITSS
  • DAITSS = the preservation repository software used by the FCLA Digital Archive
    • locally developed, planning to release as open source
    • follows OAIS very strictly, designed for format migration
  • Developed concurrently with PREMIS WG deliberations
  • All metadata stored redundantly:
    • in MySQL database for fast access and reporting
    • in XML in stored AIP for system-independence
  • Depositors must create SIPs with METS format SIP descriptor
18
Data modeling issues
  • PREMIS has
    • Intellectual entity->representations->files->bitstreams
  • DAITSS has
    • Intellectual entity->files->bitstreams


  • PREMIS has
    • Objects, Events, Agents, Rights
  • DAITSS has
    • Objects, Events, Contacts (authorized individuals)


  • PREMIS has compositionLevel
  • DAITSS records some transformations as part of storage preparation
19
How values are obtained
  • objectCharacteristics for files and bitstreams obtained by program at Ingest by parsing files
    • when data in SIP descriptor doesn’t match automatically obtained values
  • storage assigned by program at Ingest
  • environment not recorded – we hope to take advantage of an environments registry some day
  • relationship data recorded by program at Ingest
    • what files are in an AIP
    • what bitstreams are in a file
    • different versions of a file (localized, migrated, etc.)
    • related events and linking events
20
Additional metadata not in PREMIS
  • Format-specific technical metadata
    • treated in DAITSS mostly as properties of bitstreams

  • Archive logic and billing


  • Information about “distributed” files


  • Source of file
    • deposited original, migrated, normalized, archive-created original, downloaded from Internet


21
For more information
  • FCLA Digital Archive site
    • http://www.fcla.edu/digitalArchive/index.htm


  • DAITSS information
    • http://www.fcla.edu/digitalArchive/soft.htm


  • DAITSS conformance to PREMIS
    • http://www.fcla.edu/digitalArchive/pdfs/PREMISConformance.pdf


  • Priscilla Caplan <pcaplan@ufl.edu>



22


Implementing PREMIS using XML in METS
23
Issues
  • Which METS sections to use
  • How to record elements that are also part of a format specific technical metadata schema (e.g. MIX)
  • Whether to record elements redundantly in PREMIS that are defined explicitly in the METS schema
  • Recording structural relationships
  • How to deal with locally controlled vocabularies
  • Whether to use the PREMIS container


24
PREMIS and METS sections
  • Flexibility of METS requires implementation decisions
  • Alternative 1
    • Object in techMD
    • Event in digiProvMD
    • Rights in rightsMD
    • Agent with event or rights
  • Alternative 2
    • Everything in digiProvMD
    • -or-
    • Everything in techMD
25
METS elements vs. PREMIS elements
  • Checksum, Checksumtype: attributes on <file> in METS
  • messageDigest, messageDigestAlgorithm in PREMIS Object


  • MIMETYPE: attribute on <file> in METS
  • <formatDesignation> in PREMIS more granular, may include MIMETYPE


  • Structural relationships detailed in structMap in METS
  • <relationship> semantic unit in PREMIS


  • What are the implementation advantages/disadvantages in recording redundantly?






26
URLs, etc.
  • PREMIS Maintenance Activity:
  • http://www.loc.gov/standards/premis/


  • PREMIS Working Group:
  • http://www.oclc.org/research/projects/pmwg/


  • Data Dictionary for Preservation Metadata: Final Report of the PREMIS Working Group:
  • http://www.oclc.org/research/projects/pmwg/premis-final.pdf


  • Please send project information to Implementers’ Registry and join the PIG list!


  • pcaplan@ufl.edu enders@mail.sub.uni-goettingen.de
  • rgue@loc.gov nhoebel@stanford.edu