Library of Congress
Audio-Visual Prototyping Project
Carl Fleischhauer
National Digital Library Program
DLF Forum November 19, 2000

National Audio-Visual Conservation Center
New Library of Congress facility, development led by the Motion Picture, Broadcasting, and Recorded Sound Division and National Digital Library Program
Support from the David and Lucile Packard Foundation and the Packard Humanities Institute
Will be in Culpeper, Virginia, 70 miles from Washington
Planned to go operational 2004
Development and prototyping during 2000-2003

Project Motives 1
Alternative preservation approach
Analog magnetic recordings: tape-to-tape copying never very good idea
Cessation of manufacture of tape and tape recorders
Risk of deterioration of tangible born-digital, e.g., audio CDs
Opportunity to begin investigation of intangible born-digital content, e.g., MP3

Project Motives 2
Provide access
LC researchers on Capitol Hill
Collections in Culpeper
Possible future authorized research sites
Limited outreach, most items protected by copyright

Project Themes
Content streams to preserve
Reformatting analog and tangible digital - major element
Processing intangible born digital - minor element
Preserving digital content from both streams
NOTE: Traditional preservation copying to continue until new approach is accepted

Reformatting Questions
Preservation quality bitstreams
Bitstream type?
PCM audio?  Bit-mapped images?  Video?!?  Native formats for tangible born digital?
File format?
Industry “standard” WAVE and TIFF?  CD-A to WAVE?
Quality level?
Samples per second?  Pixels per inch?  Bits per sample or pixel?  Accept native quality level?

Intangible Born-Digital Questions
Minor prototyping element today
Analyze
Unpack?  Reformat? Future system emulation?
“Transform into persistent object” (NARA/UCSD)

Metadata Takes Over
Technology projects seem to turn into metadata projects . . . .
Task A: Defining concepts and content elements within metadata
Task B: Input/output, capture/encode

Metadata Concepts 1   Categories
Bibliographic or intellectual metadata
What content is this?
Intellectual metadata supports discovery
Sample fields
title
creator (author)
publisher
subjects
original physical desc (for reformatted items)

Metadata Concepts 1   Categories
Administrative metadata
What do I need to know to manage this object?
Supports content preservation (e.g., archiving, migration, emulation)
Supports control of access, plans call for on-site limit to copyrighted content
More
Samples follow

Metadata Concepts 1   Categories
Sample Admin Metadata 1
For general administration
access category and/or rights information
about the source item that was reformatted
persistent name (URN, handle)
who/how digitized
who is responsible for object management

Metadata Concepts 1  Categories
Sample Admin Metadata 2
Supports migration and more
encryption, internet media type, file extension
checksum (to verify file integrity)
technical data re: bitstreams (“format metadata”)
sample rate, bit depth, color space, compression, targets, spatial resolution, pixels horiz& vert, watermark, and more
system emulation requirements (future)

Metadata Concepts 1 Categories
Structural Metadata
How does this object fit together?
LC experience strongest re: reformatted content
Sample data
Express hierarchy
primary, intermediate, and terminal levels
Express relationships
“I am the high resolution version of page 3”

Metadata Concepts 2
 Reformatting Hierarchical Content
Illustration: pop music album
Two-sided 12-inch disc with printed labels
8 musical selections
4-page booklet (fake for our mockup!)
Album cover art and text front and back
Master and service reproductions
Total 32 or more digital files

Slide 15

Metadata Input/Output 1      Goals
Emerged in process, difficult to synch
Database to capture metadata
structured for efficient data entry
“Archival” XML document for long-term retention
complete data set, “migratable” as needed
compare OAIS archival info package
“Presentation” XML document
streamlined for good fit to user interface
compare OAIS dissemination info package
copyrighted content, local client, no WWW

Metadata Input/Output 2 Database
Access Database
Circa 150 fields in a dozen tables
Working to automate data grab from bib records, file headers, and directory lists of files

Metadata Input/Output 3 Archival Object
Concept from the Making of America 2 project at Cal Berkeley
Promising for archival object, contains complete set of metadata in migratable XML form
Current XML DTD needs to evolve/expand to embrace audio-visual elements

Metadata Input/Output 4 Presentation Object
We have a proof-of-concept interface client and dataset from special XML document
Desire: the special XML document should be a subset of the MOA2 document type, derivable from it

Slide 20

Slide 21

Slide 22

Metadata Input/Output 5         What is Difficult?
Too darn many fields
150 “possibles” in our set at this time
includes reformatting documentation
Evolving fit with MOA2
add new elements for audio-visual content, cannot use DTD out of the box
Cumbersome data creation and transformation
proof-of-concept mode is “handmade”
transformation may be seen as OAIS ingestion, downstream of production

What about content preservation?

Content Preservation Today
For now, working in a UNIX storage network world
Masters in one set of filesystems, service copies in another
Essence bitstreams and XML in online or nearline storage are the “preservation copies”
Archived copies on offline media are “protection copies,” remake periodically

Content Preservation Tomorrow
AV project produced conceptual design for repository
As this work proceeded, we studied two other models:
University of California, San Diego supercomputer center Persistent Archive Design
OAIS reference model (we now borrow its general concepts and terminology)

Slide 27

OAIS Schematic

Content Preservation Tomorrow
Under way
defining ingestion and AIP(s)
defining access and DIP(s)
AV has special requirements
Deferred
study of core OAIS elements archival storage and data management
general LC enterprise development planned

Web Sites
LC audio-visual prototyping project
http://lcweb.loc.gov/rr/mopic/avprot/avprhome.html
LC enterprise-wide Digital Repository planning
Features metadata tables
http://lcweb.loc.gov/standards/metadata.html
OAIS (Open Archival Information System)
http://ssdoo.gsfc.nasa.gov/nost/isoas/overview.html
UCSD supercomputer data-intensive computing
http://www.npaci.edu/Research/DI/index.html

Thanks -- the end