The workflow should be adaptable to other production systems.
At the NDIIPP Partners meeting in January in Berkeley, we were asked to participate in an exercise. We were given post it notes and Sharpies and asked the first question in each column. We were asked to write our answers on the post its and hang them on the wall. I think some categories were already set up, but I can’t remember. I do remember that the answers were compiled and the results were shared with us.
Using these questions as a way to build this discussion of our DIGARCH project made sense in light of the way that NDIIPP is building this national digital preservation infrastructure.
Conversations with History
Institute of International Studies
UC Berkeley
Harry Kreisler, executive producer and moderator
Started in 1982, more than 300 shows to date
Guests include diplomats, statesmen, and soldiers; economists and political analysts; scientists and historians; writers and foreign correspondents; activists and artists. The interviews span the globe and include discussion of political, economic, military, legal, cultural, and social issues shaping our world. At the heart of each interview is a focus on individuals and ideas that make a difference.
Broadcast on UCTV
Staff of UCTV comprise a subset of the staff of UCSD TV
UCSD TV’s schedule looks similar to a public television station.
UCTV delivers documentaries, faculty lectures, cutting-edge research symposiums and artistic performances from each of the ten UC campuses. They are available on TV and also web cast.
For the purposes of this project, the station is UCTV.
For this production, Conversations with History, the UCSD Libraries didn’t need any expertise. They didn’t have a stake in ths particular program.
However, like many libraries, the UCSD Libraries does need the expertise of the SDSC in preservation of digital assets.
For this project we are capturing the TV show, the transcript and the descriptive metadata from the FileMaker Pro database at UCTV.
The interview is taped by UCB staff.
At this point Harry used to send the description of the show via email to UCTV for data entry.
This process has been streamlined so that Harry has access to the UCTV database. Staff at UCTV have refined their database to include pulldown menus from which Harry chooses descriptive terms for each show.
UCTV also makes three versions of the original digital file.  The three versions are an edited DV version, a MPEG version, and a RealPlayer version.
UCTV outputs an XML record containing preservation descriptive metadata for the interview and preservation technical metadata for each of the content files. This forms the first SIP for preservation.
We’re still debating this piece.
This model shows the staff from CwH submitting the transcript to SDSC, but SDSC is now working out a way to harvest the transcript from the web site.
The debate came about when 2 issues arose
1.Which version of the transcript should be preserved? The original as it came back from the transcriber or the version that was on the web site? What were we trying to document?
2. Were we creating new work or additional tasks for the staff at CwH? In thinking back to the goals, remember the goal to minimally disrupt the existing workflow.
This is where the Libraries’ expertise comes in.
First SIP is for descriptive, technical and rights metadata about the TV show.
Second SIP describes the transcript and is much simpler (technical metadata plus UCTV program number).
The SIPs for the metadata are consolidated in an AIP expressed as a METS document. The METS wrapper utilizes standard data formats.  The descriptive metadata is expressed in the  MODS schema, the technical metadata in the PREMIS schema, and the rights metadata in the  MetsRIGHTS schema.
The METS wrapper properly relates the metadata and content files and indicates which file is the original master, which file contains the broadcast version and which file contains the web (streaming) version.  For the long view, the METS wrapper allows the content files and their metadata to be transmitted to other repositories and to interoperate more easily with other collection materials should that be desirable.
The SIP from UCTV can be expressed in the basic METS schema if that is desirable for the digital repository.  In that case, the CwH SIP would have to be integrated into the METS as a PREMIS technical metadata record for the digital transcript.
Here’s where SDSC expertise comes in and pulls the pieces together.
Data Grid Technology
The preservation of CwH content is based on using data grid technology to manage distributed data.  The Storage Resource Broker (SRB) is used as the preservation repository.  A central metadata catalog (MCAT) manages preservation metadata for each video file.  A dedicated MCAT instance called UCTVStudioArchive was set up.  Two additional logical storage resources were registered to store digital video replicas on SAMQfs (uctv-fs1) and HPSS (hpss-sdsc).  Also, SRB client software and Kepler scientific workflow software were installed on the eMac machine at UCTV.  Finally, a grid brick (srbbrick7) with 300GB of disk space was configured for the DIGARCH project.
Kepler workflow
Automated workflow that looks for data, ingests it, replicates it safely for long term preservation
Kepler is an “executable Visio” type program, which allows the building of a workflow program by dragging and dropping components, glueing them together, and executing the overall flow.  Components can be grid-enabled and perform grid type operations (put, get, authenticate, monitor, report, filter, store, discover, etc.).
It’s a long way from Harry’s interview with Kofi Annan to SRB grid brick 7.
Communication is key. Diagrams, meetings. Break down the workflows into small pieces.
Eye to the bigger picture. Does this -- workflow, process, AIP -- work for other TV productions or is this process unique to CwH?
Research and production
TV group and the library group are production oriented – when is this thing going to work?
SDSC group is research oriented – let’s find the best way to do this process that no one has ever done before
Research and production are two areas of expertise brought to the table.
With good communication and keeping an eye to the bigger picture, the tension between research and production can be eased.