Click to edit Master text styles
The famous “OAIS Diagram” shows us a
stylized high-level view of an archive. [Bearing in mind the fundamental fact
that “the map is not the terrain, “] we can differentiate five kinds of
features: the blue “functional entity groups”, the “information packages” in
the white ovals, three external generalized Actors, some solid and dotted
lines suggesting relationships among the blue entity groups and the actors,
and some arrows implying a directed data flow. If this diagram has never
really made sense to you, you’re not alone. We have to look deeper to
understand how this model describes all archives.
Over the past few years we have become familiar with
the terms "Ingest", "Access", "Archival Storage",
"Preservation planning", Administration" and "Data
Management" as the components of an OAIS archive. And we've learned to talk about
metadata in terms of the OAIS Information Package: the SIP, AIP, and DIP. If
understanding these terms were all there is to creating an archive, we would all
be practicing good long-term preservation of our digital assets. Of course,
there is more to the OAIS
reference model than these basic concepts, and from the beginning of the EATMOT
project we knew we needed
a deeper understanding of them to build a working digital archive.
We understand the OAIS Reference Model to be a
collection of all the functions
that take place in the ideal, full-service archive. The group that conceived the model
identified approximately thirty functions: negotiating the agreement between a
depositor and the archive; various kinds of compliance and error checking;
generating reports; billing; managing, storing, and retrieving data; and
many more. Some of the functions are purely technical, such as replacing
storage media, but just as many are knowledge processes: monitoring the needs
of the user community, planning preservation strategies, managing the system
configuration, establishing archive policies, among others. The OAIS
document's "Composite of Functional Entities" diagram lays all of these functions
out in a data-flow diagram that can confuse an experienced archivist (see Fig. 1); looking at this
diagram we wondered where we
going to find a place to start unraveling this complexity to find a place to start building our
archive. We knew we could not “eat the
elephant” in a single
bite (byte?), but instead we’d have to divide it up in relatively small, easily consumable
Programming (XP) proponents advocate four values in system development:
communication, feedback, simplicity, and courage. They envision the
development process as a communication loop that involves the customer, the
requirements, the system designer, and the programmer. One of the areas where they
look for simplicity is in deciding what tasks need to be done next. The
customers, the designers, and the programmers look at the system requirements
(as they know them at the moment). The customers decide which small
parts of the system are most important to them at the moment and write a
description on "story cards". The designers and programmers then
estimate how long implementing each "story" will take to complete. Knowing what
is needed next and how long each task might take, they choose the
stories that the programmers will work on next. When they have completed those
parts of the system, they all repeat the story card process. (Courage is
an XP value because all the participants have to have the courage to jump right
in and make decisions that might later prove to be wrong--the stories may
not have been the right ones to work on: requirements may change, designs
may not work, programmers may have to rewrite components.)
We adapted this technique to prioritizing our
understanding of the complete OAIS model. We considered each of us to be a customer
in the XP sense; all of
us have been involved in digital libraries for some years and know our institutions' current
requirements for a digital archive. The Cornell team, that developed the stories
includes an archivist, a systems technology administrator, a metadata librarian, and a system
designer. We put each OAIS function
on a 3x5 card, along with a brief description. Each member of the team received a deck
of cards and pulled out the cards he/she considered to be important. As a group, we discussed each story and
decided on three we wanted
to start on. We then refined those
I’m going to use this story, the Update Storage story, to illustrate how we’re
eating the elephant one bite at a time.
The next slide is a simplification, showing only the four functions we’re focusing
on in this story.
The next slide is a simplification,
showing only the four functions we’re focussing on in this story.
The arrows in this drawing show the
relationships among the functions—which functions talk with each other or pass
each other messages of one sort or another.
You’ll notice that one of the arrows is bi-directional;
it indicates that some sort of request and response conversation goes on between
the two functions it joins.
You can also see that two
functions interact with other functions
outside the boundaries of this story.
What is the nature of all these interactions?
They all have this in common: the functions communicate with each other
by passing information to each other. The arrows show the direction the
Let’s call the individual packets
of information “messages”.
I’ve added the names of the messages to the next slide.
These messages are the actual stuff of an
archive. That stuff can be the different forms of the OAIS Information Package
--you’ll notice that the AIP is being passed from one function to another—but
all the other kinds of messages are important, too.
The next slide
categorizes the messages in a way that I hope makes them more understandable.
BLUE – Metadata—parts or the whole of
information packages and descriptive information. On the well-known,
high-level OAIS diagram, they appear in little ovals.
GRAY – Internal computer
messages—not meant for human consumption
Next we’ll hear from Markus Enders from Gottingen who will talk about metadata,
then I’ll come back and talk about some open questions
PINK – Human-readable policies and
When I look at this diagram, with the messages highlighted this way, I start
to understand that the functions are information processing functions. They modify
the messages they receive and send them on—ultimately to storage or to the
human actors identified in the high level OIAS diagram: the Management, the
Producer, or the Consumer.
Summary—last 5 minutes?
How much interoperation?
Archivists might flag objects to indicate importance or longevity.
We’ll respect other archivists’ decisions
OR We’ll take responsibility after negotiation
OR We’ll preserve regardless
A confirmation message need not be a
tagged message. I’m using this XML snippet to show the information that comes
back to the Co-ordinate Updates functions. The information will be passed to
the Data Management functions to enable
document discovery and access.
Next, let’s look at a diagram that shows all the individual functions grouped under