Notes
Slide Show
Outline
1

The PREMIS Working Group:
Preservation Metadata
for Digital Repositories
  • DLF Fall Forum
  • October 26, 2004
  • Rebecca Guenther
  • LC/NDMSO
  • rgue@loc.gov
2
Preservation Metadata Functions
  • Information that supports and documents the digital preservation process:
    • Establish provenance: track chain of custody and alterations over time
    • Details authenticity
    • Documents technical processes object has undergone
    • Describes technical details of object
    • Describes the environment from which it originated
    • Specify rights management information

3
Preservation Metadata Functions (cont.)

  • Provide information to maintain resources over the long term:
    • viability: object’s bitstream is intact
    • renderability: object can be translated to a form that can be viewed or used
    • understandability: rendered content can be interpreted and understood


4
Background
  • March 2000: OCLC and RLG jointly sponsor international working group on preservation metadata
    • Identify key issues/challenges
    • Seek consensus on recommendations and best practice

  • White paper (January 2001)
    • Defined preservation metadata; role in preservation process
    • Reviewed/synthesized existing preservation metadata schemes

  • Preservation metadata framework (June 2002)
    • Comprehensive description of types of information constituting preservation metadata
    • Based on OAIS information model
    • Set of “prototype” preservation metadata elements
5
Aftermath …
  • Framework …
    • Consolidated expertise
    • Provided foundation for developing formal preservation metadata specifications
    • Common departure point for different schema implementations

  • But ... further scope for collaboration in preservation metadata
    • Needed best practices/recommendations for implementing preservation metadata in real world digital archiving systems
6
Issues unresolved in WG
  • How minimal is a core preservation metadata element set?
  • How much metadata can be generated automatically?
  • Is it useful to apply metadata elements by object type or object behavior?
  • Levels of granularity not addressed
  • Need to provide less abstract view of preservation metadata for implementation


7
PREMIS
  • June 2003: OCLC and RLG sponsored new working group: PREMIS
    • Preservation Metadata: Implementation Strategies

  • Objectives
    • Define “core” set of preservation metadata elements, with supporting data dictionary, applicable to broad range of digital preservation activities
    • Identify and evaluate alternative strategies for encoding, storing, managing, and exchanging preservation metadata

8
Membership
  • Priscilla Caplan, FCLA (Chair)
  • Rebecca Guenther, LC (Chair)
  • Michael Alexander, British Library
  • George Barnum, GPO
  • Charles Blair, U. of Chicago
  • Olaf Brandt, U. of Gottingen
  • Adam Farquhar, British Library
  • David Gewirtz, Yale
  • Kevin Glavash, MIT/Dspace
  • Cathy Hartman, U. of N. Texas
  • Helen Hodgart, British Library
  • Nancy Hoebelheinrich, Stanford
  • Roger Howard/Sally Hubbard, Getty Museum
  • Pam Kircher, OCLC
  • John Kunze, Calif. Digital Library
  • Brian Lavoie, OCLC liaison
  • Robin Dale, RLG liaison
  • Vicky McCarger, LA Times
  • Jerry McDonough, NYU/METS
  • Evan Owens, JSTOR
  • Erin Rhodes, NARA
  • Madi Solomon, Walt Disney Co.
  • Angela Spinazze, ATSPIN
  • Stefan Strathmann, U. of Gottingen
  • Gunter Waibel, RLG
  • Lisa Weber, NARA
  • Robin Wendler, Harvard
  • Hilde van Wijngaarden, KB
  • Andrew Wilson, NAA
9
Advisory Committee
  • Howard Besser, UCLA
  • Liz Bishoff, OCLC (via Colorado Digitization Program)
  • Gerard Clifton, National Library of Australia
  • Gail Hodge, CENDI
  • Steve Knight, National Library of New Zealand
  • Maggie Jones, Digital Preservation Coalition
  • Nancy McGovern, Cornell
  • Cliff Morgan, Wiley UK
  • Richard Rinehart, U. of California, Berkeley


10
PREMIS Subgroups
  • Core elements
    • Establish core metadata elements and data dictionary
    • Developed a data model
    • Has had 2 face-to-face meetings
    • Weekly conference calls
  • Implementation
    • Examine alternative strategies for encoding, storage and management of preservation metadata
    • Conducted a survey of practices
    • Monthly conference call
  • Expect to complete activities by end of 2004
11
Core elements subgroup
  • Development of data model
    • Objects
    • Events
    • Agents
    • Intellectual entities
    • Rights
  • Data dictionary structured according to entities


12
Core Elements
  • Conducting element-by-element review of prototype elements from metadata framework
    • Is the element “core”?
    • How is it being used at WG members’ institutions?
    • How should it be implemented/populated?
    • Elements not covered by the framework?

13
Objects
  • Identifiers
  • Location
  • Descriptive metadata out of scope
  • Technical metadata not specific to particular file format
  • Levels of objects: representation, file, filestream, bitstream
14
Objects:
Technical metadata
  • Object characteristics
    • Fixity
    • Size
    • Format (including link to format registry)
    • Inhibitors
    • Significant properties
    • Creating application information
  • Environment (software, hardware)
  • Externally defined technical metadata (e.g. Z39.87/MIX)


15
Events
  • Digital provenance/process information
  • Actions that involve one or more objects
  • May be related to one or more agents
  • Semantic units
    • Event identifier
    • Event type
    • Event outcome
    • Event detail
    • Event date/time




16
Agents
  • Agent descriptions out of scope
  • Attributes of agents associated with preservation events and rights management
    • May carry-out, authorize, or compel one or more events
    • may create or act upon one or more objects
    • may hold or grant one or more rights
  • Semantic units
    • Agent identifier
    • Agent name
17
Rights and relationships
  • Rights
    • Only in context of right to preserve
    • Collecting rights use cases
  • Relationships
    • Data model expresses relationships between entities
    • Relationships between objects
      • Derivative, dependency, structural
18
 
19
 
20
Implementation Strategies subgroup
  • Conducted survey of preservation repositories to explore the state of the art
  • Questions about policies, governance, funding, system architecture, preservation strategies, metadata implementation
  • 70 surveys sent
  • Responses from 28 libraries, 7 archives, 14 other in 13 different countries
  • 10 national libraries, 6 national archives
  • Survey published Oct. 2004
21
Survey findings
  • Little experience with digital preservation
    • Most didn’t have active preservation strategy
    • Many not yet in production
    • Cannot assess adequacy of metadata
  • Lack of common vocabulary and conceptual framework
    • Informed by OAIS reference model
    • Difference of opinion as to meaning of OAIS compliance
22
Survey findings (cont.)
  • Metadata
    • Many recording rights, provenance, technical, administrative, descriptive and structural
  • Consistent roles in preservation scope and policies (academic libraries, archives, national libraries)
  • Substantial use of METS, Z39.87/MIX, OCLC sets
  • Most repositories serve goals of both preservation and access


23
Trends
  • Store metadata redundantly in XML or relational database and with content data objects
  • Use METS for structural metadata and as container for descriptive and administrative; MIX for images
  • Use OAIS as framework and starting point
  • Maintain multiple versions (originals, some normalized or migrated) in repository with complete metadata for all versions
  • Choose multiple strategies for digital preservation
24
Looking ahead
  • Finalize core preservation metadata elements set


  • Complete data dictionary


  • XML schemas to support exchange of core elements for digital provenance/process and technical metadata


  • Final PREMIS report by end of 2004


  • Community outreach: opportunities for public comment


  • Follow-on activities?
25
More information…
  • PREMIS Web site:
    • http://www.oclc.org/research/projects/pmwg/

  • “Implementing Metadata in Digital Preservation Systems: The PREMIS Activity” D-Lib (April ‘04)
    • http://www.dlib.org/dlib/april04/lavoie/04lavoie.html

  • Rebecca Guenther: rgue@loc.gov


  • Priscilla Caplan: pcaplan@ufl.edu