1
|
- Priscilla Caplan, OCLC
- Rebecca Guenther, Library of Congress
- Markus Enders, SUB Göttingen
- Nancy Hoebelheinrich, Stanford Univ.
- Digital Library Federation Spring Forum
- April 10, 2006
|
2
|
- Introduction to PREMIS data dictionary
- Implementing the data dictionary in a repository
- Issues
- Implementing in DAITSS
- Implementing in LMER
- Implementing using XML in METS
- Issues
- Stanford Digital Repository
- MathArc
|
3
|
- May 2005: Data Dictionary for Preservation
- Metadata: Final Report of the PREMIS Working Group
- 237-page report includes:
- PREMIS Data Dictionary 1.0
- Accompanying report (context, data model, assumptions)
- Special topics, glossary, usage examples
- Data Dictionary: comprehensive, practical resource for implementing
preservation metadata in digital archiving systems
- Comprehensive view of information requirements needed to support
digital preservation
- Based on deep pool of institutional experiences in setting up and
managing operational capacity for digital preservation
- Builds on previous work
- XML schemas: Set of 5 XML schemas to support implementation
|
4
|
|
5
|
- “Preservation metadata”: maintain viability, renderability,
understandability, authenticity, identity in a preservation context
- “Core”: What most preservation repositories need to know to preserve
digital materials over the long-term
- “Implementable”: rigorously defined; supported by usage guidelines/recommendations; emphasis
on automated workflows
|
6
|
|
7
|
- objectIdentifier
- preservationLevel
- objectCategory
- objectCharacteristics
- creatingApplication
- originalName
- storage
- environment
- signatureInformation
- relationship
- linkingEventIdentifier
- linkingIntellectual Entity Identifier
- linkingPermission StatementIdentifier
|
8
|
- compositionlevel
- fixity
- size
- format
- significantProperties
- inhibitors
|
9
|
- eventIdentifier
- eventType
- eventDateTime
- eventDetail
- eventOutcome
- eventOutcomeDetail
- linkingAgentIdentifier
- linkingObjectIdentifier
|
10
|
- agentIdentifier
- agentName
- agentType
|
11
|
- permissionStatement
- permissionStatementIdentifier
- relatedObject
- grantingAgent
- grantingAgreement
- permissionGranted
- act
- restriction
- termOfGrant
- permissionNote
|
12
|
|
13
|
- No assumptions on specific implementation
- Promote flexibility/interoperability
- Focus on semantic units: what you need to know (implementation-neutral)
vs. metadata elements: how you record it (implementation-specific)
- Information that needs to be “recoverable” from the digital archiving
system, independent of local implementation
- Options for SIP or DIP:
- Use PREMIS XML schemas as is
- Incorporate pieces of schemas into METS
- Incorporate into another framework (e.g. DIDL)
|
14
|
|
15
|
- For systems in development
- as a basis for metadata definition
- For existing repositories
- as a checklist for evaluation
- “It seems that often people
say they aren't ready to implement PREMIS yet, but they don't seem to
realise they are already collecting some of the same information that
PREMIS describes. The metadata is the same because it is often common
sense that it is needed in a repository system. PREMIS can be useful to
point out a few extra areas they perhaps hadn't thought of yet.”
|
16
|
- Reconciling repository data model with PREMIS data model
- What must be explicitly recorded locally, and what can be implicit?
- Implementation in relational databases
- How to create or obtain metadata values?
- Role of registries
- Need for tools
- What values to use for controlled vocabularies?
- Need to supplement with non “core” metadata
- How to represent metadata in a standard way in XML for SIPs and DIPs
|
17
|
- DAITSS = the preservation repository software used by the FCLA Digital
Archive
- locally developed, planning to release as open source
- follows OAIS very strictly, designed for format migration
- Developed concurrently with PREMIS WG deliberations
- All metadata stored redundantly:
- in MySQL database for fast access and reporting
- in XML in stored AIP for system-independence
- Depositors must create SIPs with METS format SIP descriptor
|
18
|
- PREMIS has
- Intellectual entity->representations->files->bitstreams
- DAITSS has
- Intellectual entity->files->bitstreams
- PREMIS has
- Objects, Events, Agents, Rights
- DAITSS has
- Objects, Events, Contacts (authorized individuals)
- PREMIS has compositionLevel
- DAITSS records some transformations as part of storage preparation
|
19
|
- objectCharacteristics for files and bitstreams obtained by program at
Ingest by parsing files
- when data in SIP descriptor doesn’t match automatically obtained values
- storage assigned by program at Ingest
- environment not recorded – we hope to take advantage of an environments
registry some day
- relationship data recorded by program at Ingest
- what files are in an AIP
- what bitstreams are in a file
- different versions of a file (localized, migrated, etc.)
- related events and linking events
|
20
|
- Format-specific technical metadata
- treated in DAITSS mostly as properties of bitstreams
- Archive logic and billing
- Information about “distributed” files
- Source of file
- deposited original, migrated, normalized, archive-created original,
downloaded from Internet
|
21
|
- FCLA Digital Archive site
- http://www.fcla.edu/digitalArchive/index.htm
- DAITSS information
- http://www.fcla.edu/digitalArchive/soft.htm
- DAITSS conformance to PREMIS
- http://www.fcla.edu/digitalArchive/pdfs/PREMISConformance.pdf
- Priscilla Caplan <pcaplan@ufl.edu>
|
22
|
|
23
|
- Which METS sections to use
- How to record elements that are also part of a format specific technical
metadata schema (e.g. MIX)
- Whether to record elements redundantly in PREMIS that are defined
explicitly in the METS schema
- Recording structural relationships
- How to deal with locally controlled vocabularies
- Whether to use the PREMIS container
|
24
|
- Flexibility of METS requires implementation decisions
- Alternative 1
- Object in techMD
- Event in digiProvMD
- Rights in rightsMD
- Agent with event or rights
- Alternative 2
- Everything in digiProvMD
- -or-
- Everything in techMD
|
25
|
- Checksum, Checksumtype: attributes on <file> in METS
- messageDigest, messageDigestAlgorithm in PREMIS Object
- MIMETYPE: attribute on <file> in METS
- <formatDesignation> in PREMIS more granular, may include MIMETYPE
- Structural relationships detailed in structMap in METS
- <relationship> semantic unit in PREMIS
- What are the implementation advantages/disadvantages in recording
redundantly?
|
26
|
- PREMIS Maintenance Activity:
- http://www.loc.gov/standards/premis/
- PREMIS Working Group:
- http://www.oclc.org/research/projects/pmwg/
- Data Dictionary for Preservation Metadata: Final Report of the PREMIS
Working Group:
- http://www.oclc.org/research/projects/pmwg/premis-final.pdf
- Please send project information to Implementers’ Registry and join the
PIG list!
- pcaplan@ufl.edu enders@mail.sub.uni-goettingen.de
- rgue@loc.gov nhoebel@stanford.edu
|