Yale/Elsevier E-Journal Archive Planning Project

New York, February 6, 2001

What is the project?

- An attempt to add meaningfully to the body of knowledge and practice related to digital archives and preservation

- A partnership with a large publisher (Elsevier Science) of many mission-critical academic materials

Personnel

All key players are internal - we are backfilling/outsourcing more routine duties
Scott Bennett and Ann Okerson as principal investigators
Paul Conway, project coordinator
David Gewirtz, technical specialist
Also other staff as needed

Why this publisher?

- Builds on earlier NERL local load and serving work with Elsevier (knowledge not lost)

- Large for-profit publishers will be more driven by business reasons to abandon not-profitable aspects or "drags" on their business than not-for-profits

- Elsevier has a huge digital collection, 1100+ titles phased in over time (since 1995). Collections from a variety of sources, countries, offices.

- Collection in multiple e-formats mimics what would be available from a larger universe of publishers. A microcosm of the scholarly publishing world

- One publisher controls these titles and is eager to cooperate and research.

Our challenges:

- The Elsevier collections already involves a wide variety of e-formats - how to sample these to make a valid long-term plan?

TIFF, PDF-wrapped TIFF, SGML, XML with variations within these

Titles come on stream at different times; any title could go thru several formats

- Extracting samples is not easy, as we learned previously.

- What will we choose for rendering software? ScienceServer on offer, but we believe we may want to keep our rendering quite separate from the Elsevier look and feel - and company.

- Identify trigger events when archives will become essential - a unique aspect is that in e-world the archive may be needed very soon, i.e., ownership of titles changes every year at the margins and so content rapidly becomes elusive or changed.

- To what extent is content separable from functionality? Is it? Can content be defined unambiguously?

- What layers of agreement will be necessary, with whom, when, etc? Business models that provide for the transfer of responsibility to an archival agent when the trigger conditions have been met?

- Can the traditional notions of an archive, especially the notions of primary and secondary uses of content, be mapped on an archive of e-journal content.

Plan of work (next 3 months)

- Preliminary technical platform established and subset of data obtained from Elsevier.

- Preliminary definition of appropriate business models defining preservation "triggering events."

- Deconstruction of the DTDs used by Elsevier Science to structure their e-journal content and of the "standardized" delivery mechanisms developed to transfer content within the Elsevier organization.

- Development of a conceptual framework for defining the separation of content and functionality within the defined structure of the Elsevier content.

- Develop a timeframe for the planning grant that has us writing a first draft report by Halloween.

- A sequence of brief background papers prepared on the working environment of digital preservation, on the status of preservation metadata, and on the nature of an archive in the digital world.

return to top >>

Last updated: