Click to edit Master text styles
Though the abstract of this talk
suggests that it will focus on the CDL’s XTF infrastructure and its good match
with the technological requirements of eScholarship’s latest publishing
project – for which XTF is indeed well suited – I have shifted the talk to
focus more broadly on the other “X” – that is XML – and its still-nascent role
within academic publishing. Hence the
title: “Whither goes eScholarship? XML and the Mark Twain Project”
For those of you unfamiliar with the
eScholarship Program, it is a digital publishing initiative within the Office
of Scholarly Communication at the University of California. Though it falls under the aegis of the CDL –
and hence happily coexists with the worlds of metadata and built content –
eScholarship’s specific mandate is to “provide leadership and operational
support for the University of California’s efforts to develop an innovative
and sustainable scholarly publishing system.”
The Mark Twain Project, which I intend
to focus on in this talk, is eScholarship’s latest venture into the digital
academic publishing world. In the
broadest sense, this project promises an expansion of the critical edition as
a scholarly tool, the development of a sustainable publishing model for
digital scholarly editions, and a node of intersection between technology and
often technologically underrepresented disciplines within the humanities. I will discuss each of these points in finer
detail in the course of this talk. But
first, a little back ground [switch to next slide] on our scholarly publishing
initiatives thus far….
This preamble to my discussion of the
Mark Twain Project is meant to give a sense of the kinds of services we’ve
been developing through eScholarship, the questions that have arisen around
these services and the ways in which the Mark Twain Project is a logical – and
exciting – outgrowth of the work that has preceded it.
Thus far, the eScholarship Program has
focused its academic publishing initiatives on two distinct digital
platforms: the eScholarship Repository
and the eScholarship Editions. The
eScholarship Repository represents a partnership built between the UC library
and the faculty in the service of enhancing faculty control over the
publishing and dissemination of their scholarly work. The eScholarship Editions, on the other
hand, is one manifestation of a partnership between the CDL and the University
of California Press, meant to extend the Press’s publishing capabilities by
generating new publishing models (and new organizational configurations to
support these models).
To focus first on the eScholarship Repository: It is, at the moment our most immediate and
targeted response to the shifting model of scholarly publishing. It offers a full-spectrum publishing
platform that includes, as you can see, preprints and reports, peer-reviewed
articles, edited volumes and peer-reviewed journals. It exploits an already established
University structure by granting research units and departments access to its
publishing tools – thus distributing the editorial and administrative
functions to those entities who seek to publish within it. And its really catching on. Over 200 UC academic units and departments
are currently using the repository. It
now holds more than 11,000 papers and boasts 61,000 full-text downloads per
week. Its total downloads since its
inception in April 2002 – a sizeable 2.9 million. (discuss graph) The eScholarship Repository publishes its
documents in PDF and is built upon the Berkeley Electronic Press (or bepress)
commercial publishing platform.
Editions is a different kind of publishing platform altogether. First of all, it supports monographic or
book-length publication exclusively. And these monographs come to us via the
University Press, rather than
directly from the author or the academic department. The primary goal of the eScholarship Editions is to
enhance UC Press’s editorial and technical publishing capacity – both in terms of extending its publishing
platforms and reconfiguring
its editorial work flow. We are also interested in the promise XML holds for the user as a more
flexible and robust display technology.
Editions, as TEI-compliant XML texts, are built upon CDL’s existing structured text infrastructure (or XTF) platform, a techology
that allows for a
finer granularity of search results and textual display.
Our eScholarship XML publishing infrastructure offers the following functionalities.
The ability to:
*configure book display and branding by collection
*show or hide the table of contents in a frame next to any book section
* choose from several "large print" versions
* search an individual book or across a collection or set of collections;
further ability to follow the search down into a specific book, with term
* control access based on IP address ranges
While we have been delighted with the success of both of these publishing
platforms, each has its own technological hurdles, sometimes even walls, that
we must clear in order to keep moving forward with the development of these
For the Repository, the hurdles lie in the fact that
the technology is optimized for journal and journal article publication.
A few of our distributed monographic series
(e.g. UC International and Area Studies) publish monongraphs within the
Repository and report that is at times an awkward fit – seemingly more
comfortable producing documents at the level of the chapter than the
In addition, bepress currently
supports PDF publishing exclusively, necessitating a bifurcation between the
technologies of our two publishing platforms.
The challenge for eScholarship Editions, on the other
hand, resides precisely in the fact that it is an XML publishing
Our first hurdle:
how to translate the Press’s post-compositor
manuscripts into XML-tagged files.
thus far has been to send the manuscripts off to India to be rekeyed and tagged:
an expensive and logistically difficult
Developing a system for producing, as part
of the editorial process, XML-tagged files that can serve as master files for
any kind of publishing the Press desires for its content.
Though the Press has shown strong interest
in this emerging technology, we have yet to identify a reliable way for
extracting XML from their editorial practice without disrupting it rather
And such potentially dramatic shifts in editorial practice necessarily beg the
question [next slide]:
Read slide. These are questions we don’t yet have
To return, then, to our current Twain
In the realm of the scholarly
or critical edition of a literary work, XML seems to need no
it is the perfect
marriage of form and function.
an ideal technology for displaying and organizing text at the level of the
sentence, the word, even the bibliographic note.
It makes manifest the kind of fine attention
to textual detail that is the hallmark of the critical edition.
It also enables the kind of integrated search
and display functionality that could render these critical editions even richer
and more suggestive as research tools.
As an archive and
a scholarly editorial project, the Mark Twain Project
and Papers at UC Berkeley’s Bancroft Library is in the rare position of being
able to provide immediate access to both primary and secondary sources.
The Papers are keen to develop an online
system for integrating these two functions – thus creating critical editions
of Twain’s writings that allow for a level of interactivity and integrated
reading experience that is unimaginable in a print edition.
At its most basic level, the digital Project
will fold Mark Twain’s private papers, comprehensive historical annotation and
complete critical apparatuses into the texts themselves. The project aims to
provide unprecedented digital access to critical editions of all of Mark
Twain’s writings, starting with the works and papers that have already been
published by the UC Press in print form and with thousands of additional
letters by Mark Twain never before made available in a single source.
But, you might ask, what has prompted
the Mark Twain Papers and Project to enter the brave new world of the digital
critical edition after 35 years of sustained print publication?
Two things in particular stand out.
The first is a matter of copyright.
Mark Twain Foundation has granted the University of California Regents
exclusive rights to the digital publication of Mark Twain’s writings for a
ten-year period (2002-2012).
Twain Papers and Project and the UC Press are keen to capitalize on this
agreement and, if possible, extend it.
In addition, the Papers and Project have been the recipient of several
major NEH grants specifically designated to fund the digital rather than print
publication of critical editions of Twain. Most recently, a major software company
has offered to fund the scanning of all of Mark Twain’s works in exchange for
use of the content, and that contract is pending.
The resultant facsimiles will be
incorporated into the web site as well.
As you might imagine, the technological demands of mounting a scholarly
digital project like this often exceed the capacity of those who work in the
world of the critical edition.
frequently, critical editions exist as a labor of love rather than a
rationalized process with institutional or external support.
Our goal with the Mark Twain Project is to
build a prototype - both technological and institutional - for the development
of scalable and sustainable digital scholarly editions of literary works.
Our best hope for achieving that goal
resides in the collaborative model we’ve established among the CDL, the Mark
Twain Papers and Project and the UC Press.
May not be easy to read…
This is a simple diagram to give you a
sense of the core technologies underlying the Mark Twain Project. As you can see, the primary components are
the web site, the eXtensivel Text Framework, the Database and the Database
Client. Editors at the MTP (will) use
the client to enter into the unified (most likely MySQL) database descriptive
metadata about various works as well as any associated bibliographic, source,
digital surrogate, or biographical material cross-referenced from the TEI
texts of those works. That metadata about the objects will then be indexed and
queried either as objects using services from the CDL Common Framework or with
more direct lookups using Saxon SQL extensions – we’re not sure yet which,
hence the dotted line. The TEI documents themselves will be handled by the
CDL’s eXtensible Text Framework (XTF), a flexible indexing and query tool that
supports searching across collections of heterogeneous data and presents
results in a highly configurable manner.
Should all go as planned, we expect to
launch the first version of this web site in March 2007.
The site’s first content will include:
A Connecticut Yankee in King Arthur’s Court
Letters through 1880
(All will be published as critical editions with apparatus)
RE: A Connecticut Yankee in King Arthur’s Court
What I like about this quote from Twain is that it suggests a model of the
book as a palimpsest – as a collapsing of what is written, what has been left
out, and what can never be said.
recognizes the latent multiplicity of a text – the way in which letting loose
those suppressed versions, those restrictive choices would necessitate a
library to house the textual proliferation that would ensue.
That’s precisely what we’re doing here, I
We’re providing a digital
library to enable just that kind of fascinating proliferation at the level of
the text and of its critical apparatus.
We hope researchers will find it indispensible … we suspect Mark Twain