Yale/Elsevier E-Journal Archive Planning Project
New York, February 6, 2001
What is the project?
- An attempt to add meaningfully to the body of knowledge and
practice related to digital archives and preservation
- A partnership with a large publisher (Elsevier Science) of
many mission-critical academic materials
All key players are internal - we are backfilling/outsourcing
more routine duties
Scott Bennett and Ann Okerson as principal investigators
Paul Conway, project coordinator
David Gewirtz, technical specialist
Also other staff as needed
Why this publisher?
- Builds on earlier NERL local load and serving work with
Elsevier (knowledge not lost)
- Large for-profit publishers will be more driven by business
reasons to abandon not-profitable aspects or "drags" on their
business than not-for-profits
- Elsevier has a huge digital collection, 1100+ titles phased
in over time (since 1995). Collections from a variety of sources,
- Collection in multiple e-formats mimics what would be
available from a larger universe of publishers. A microcosm of
the scholarly publishing world
- One publisher controls these titles and is eager to
cooperate and research.
- The Elsevier collections already involves a wide variety of
e-formats - how to sample these to make a valid long-term
- TIFF, PDF-wrapped TIFF, SGML, XML with variations within
- Titles come on stream at different times; any title could go
thru several formats
- Extracting samples is not easy, as we learned previously.
- What will we choose for rendering software? ScienceServer on
offer, but we believe we may want to keep our rendering quite
separate from the Elsevier look and feel - and company.
- Identify trigger events when archives will become essential
- a unique aspect is that in e-world the archive may be needed
very soon, i.e., ownership of titles changes every year at the
margins and so content rapidly becomes elusive or changed.
- To what extent is content separable from functionality? Is
it? Can content be defined unambiguously?
- What layers of agreement will be necessary, with whom, when,
etc? Business models that provide for the transfer of
responsibility to an archival agent when the trigger conditions
have been met?
- Can the traditional notions of an archive, especially the
notions of primary and secondary uses of content, be mapped on an
archive of e-journal content.
Plan of work (next 3 months)
- Preliminary technical platform established and subset of
data obtained from Elsevier.
- Preliminary definition of appropriate business models
defining preservation "triggering events."
- Deconstruction of the DTDs used by Elsevier Science to
structure their e-journal content and of the "standardized"
delivery mechanisms developed to transfer content within the
- Development of a conceptual framework for defining the
separation of content and functionality within the defined
structure of the Elsevier content.
- Develop a timeframe for the planning grant that has us
writing a first draft report by Halloween.
- A sequence of brief background papers prepared on the
working environment of digital preservation, on the status of
preservation metadata, and on the nature of an archive in the
return to top >>