Library of
Congress
Audio-Visual Prototyping Project
|
|
|
Carl Fleischhauer |
|
National Digital Library Program |
|
DLF Forum November 19, 2000 |
|
|
National Audio-Visual
Conservation Center
|
|
|
New Library of Congress facility,
development led by the Motion Picture, Broadcasting, and Recorded Sound
Division and National Digital Library Program |
|
Support from the David and Lucile
Packard Foundation and the Packard Humanities Institute |
|
Will be in Culpeper, Virginia, 70 miles
from Washington |
|
Planned to go operational 2004 |
|
Development and prototyping during
2000-2003 |
Project Motives 1
|
|
|
|
Alternative preservation approach |
|
Analog magnetic recordings:
tape-to-tape copying never very good idea |
|
Cessation of manufacture of tape and
tape recorders |
|
Risk of deterioration of tangible
born-digital, e.g., audio CDs |
|
Opportunity to begin investigation of intangible
born-digital content, e.g., MP3 |
Project Motives 2
|
|
|
|
Provide access |
|
LC researchers on Capitol Hill |
|
Collections in Culpeper |
|
Possible future authorized research
sites |
|
Limited outreach, most items protected
by copyright |
Project Themes
|
|
|
|
Content streams to preserve |
|
Reformatting analog and tangible
digital - major element |
|
Processing intangible born digital -
minor element |
|
Preserving digital content from both
streams |
|
|
|
NOTE: Traditional preservation copying
to continue until new approach is accepted |
Reformatting Questions
|
|
|
|
|
Preservation quality bitstreams |
|
Bitstream type? |
|
PCM audio? Bit-mapped images?
Video?!? Native formats for
tangible born digital? |
|
File format? |
|
Industry “standard” WAVE and TIFF? CD-A to WAVE? |
|
Quality level? |
|
Samples per second? Pixels per inch? Bits per sample or pixel?
Accept native quality level? |
Intangible Born-Digital
Questions
|
|
|
|
Minor prototyping element today |
|
Analyze |
|
Unpack? Reformat? Future system emulation? |
|
“Transform into persistent object”
(NARA/UCSD) |
Metadata Takes Over
|
|
|
Technology projects seem to turn into
metadata projects . . . . |
|
|
|
Task A: Defining concepts and content
elements within metadata |
|
Task B: Input/output, capture/encode |
Metadata Concepts 1 Categories
|
|
|
|
Bibliographic or intellectual metadata |
|
What content is this? |
|
Intellectual metadata supports
discovery |
|
Sample fields |
|
title |
|
creator (author) |
|
publisher |
|
subjects |
|
original physical desc (for reformatted
items) |
Metadata Concepts 1 Categories
|
|
|
|
Administrative metadata |
|
What do I need to know to manage this
object? |
|
Supports content preservation (e.g.,
archiving, migration, emulation) |
|
Supports control of access, plans call
for on-site limit to copyrighted content |
|
More |
|
Samples follow |
Metadata Concepts 1 Categories
|
|
|
|
Sample Admin Metadata 1 |
|
|
|
For general administration |
|
access category and/or rights
information |
|
about the source item that was
reformatted |
|
persistent name (URN, handle) |
|
who/how digitized |
|
who is responsible for object
management |
Metadata Concepts 1 Categories
|
|
|
|
|
Sample Admin Metadata 2 |
|
|
|
Supports migration and more |
|
encryption, internet media type, file
extension |
|
checksum (to verify file integrity) |
|
technical data re: bitstreams (“format
metadata”) |
|
sample rate, bit depth, color space,
compression, targets, spatial resolution, pixels horiz& vert, watermark,
and more |
|
system emulation requirements (future) |
Metadata Concepts 1
Categories
|
|
|
|
|
Structural Metadata |
|
How does this object fit together? |
|
LC experience strongest re: reformatted
content |
|
|
|
Sample data |
|
Express hierarchy |
|
primary, intermediate, and terminal
levels |
|
Express relationships |
|
“I am the high resolution version of
page 3” |
Metadata Concepts
2
Reformatting Hierarchical Content
|
|
|
|
Illustration: pop music album |
|
Two-sided 12-inch disc with printed
labels |
|
8 musical selections |
|
4-page booklet (fake for our mockup!) |
|
Album cover art and text front and back |
|
Master and service reproductions |
|
Total 32 or more digital files |
Slide 15
Metadata Input/Output 1 Goals
|
|
|
|
|
Emerged in process, difficult to synch |
|
Database to capture metadata |
|
structured for efficient data entry |
|
“Archival” XML document for long-term
retention |
|
complete data set, “migratable” as
needed |
|
compare OAIS archival info package |
|
“Presentation” XML document |
|
streamlined for good fit to user
interface |
|
compare OAIS dissemination info package |
|
copyrighted content, local client, no
WWW |
Metadata Input/Output 2
Database
|
|
|
Access Database |
|
Circa 150 fields in a dozen tables |
|
Working to automate data grab from bib
records, file headers, and directory lists of files |
Metadata Input/Output 3
Archival Object
|
|
|
Concept from the Making of America 2
project at Cal Berkeley |
|
Promising for archival object, contains
complete set of metadata in migratable XML form |
|
Current XML DTD needs to evolve/expand
to embrace audio-visual elements |
Metadata Input/Output 4 Presentation
Object
|
|
|
We have a proof-of-concept interface
client and dataset from special XML document |
|
Desire: the special XML document should
be a subset of the MOA2 document type, derivable from it |
Slide 20
Slide 21
Slide 22
Metadata Input/Output 5 What is Difficult?
|
|
|
|
Too darn many fields |
|
150 “possibles” in our set at this time |
|
includes reformatting documentation |
|
Evolving fit with MOA2 |
|
add new elements for audio-visual
content, cannot use DTD out of the box |
|
Cumbersome data creation and
transformation |
|
proof-of-concept mode is “handmade” |
|
transformation may be seen as OAIS
ingestion, downstream of production |
What about content
preservation?
Content Preservation Today
|
|
|
For now, working in a UNIX storage
network world |
|
Masters in one set of filesystems,
service copies in another |
|
Essence bitstreams and XML in online or
nearline storage are the “preservation copies” |
|
Archived copies on offline media are
“protection copies,” remake periodically |
Content Preservation
Tomorrow
|
|
|
|
AV project produced conceptual design
for repository |
|
As this work proceeded, we studied two
other models: |
|
University of California, San Diego
supercomputer center Persistent Archive Design |
|
OAIS reference model (we now borrow its
general concepts and terminology) |
Slide 27
OAIS Schematic
Content Preservation
Tomorrow
|
|
|
|
Under way |
|
defining ingestion and AIP(s) |
|
defining access and DIP(s) |
|
AV has special requirements |
|
Deferred |
|
study of core OAIS elements archival
storage and data management |
|
general LC enterprise development
planned |
Web Sites
|
|
|
|
LC audio-visual prototyping project |
|
http://lcweb.loc.gov/rr/mopic/avprot/avprhome.html |
|
LC enterprise-wide Digital Repository
planning |
|
Features metadata tables |
|
http://lcweb.loc.gov/standards/metadata.html |
|
OAIS (Open Archival Information System) |
|
http://ssdoo.gsfc.nasa.gov/nost/isoas/overview.html |
|
UCSD supercomputer data-intensive
computing |
|
http://www.npaci.edu/Research/DI/index.html |
Thanks -- the end