Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
Though the abstract of this talk suggests that it will focus on the CDL’s XTF infrastructure and its good match with the technological requirements of eScholarship’s latest publishing project – for which XTF is indeed well suited – I have shifted the talk to focus more broadly on the other “X” – that is XML – and its still-nascent role within academic publishing.  Hence the title:  “Whither goes eScholarship?  XML and the Mark Twain Project”
For those of you unfamiliar with the eScholarship Program, it is a digital publishing initiative within the Office of Scholarly Communication at the University of California.  Though it falls under the aegis of the CDL – and hence happily coexists with the worlds of metadata and built content – eScholarship’s specific mandate is to “provide leadership and operational support for the University of California’s efforts to develop an innovative and sustainable scholarly publishing system.”
The Mark Twain Project, which I intend to focus on in this talk, is eScholarship’s latest venture into the digital academic publishing world.  In the broadest sense, this project promises an expansion of the critical edition as a scholarly tool, the development of a sustainable publishing model for digital scholarly editions, and a node of intersection between technology and often technologically underrepresented disciplines within the humanities.  I will discuss each of these points in finer detail in the course of this talk.  But first, a little back ground [switch to next slide] on our scholarly publishing initiatives thus far….
This preamble to my discussion of the Mark Twain Project is meant to give a sense of the kinds of services we’ve been developing through eScholarship, the questions that have arisen around these services and the ways in which the Mark Twain Project is a logical – and exciting – outgrowth of the work that has preceded it.
Thus far, the eScholarship Program has focused its academic publishing initiatives on two distinct digital platforms:  the eScholarship Repository and the eScholarship Editions.  The eScholarship Repository represents a partnership built between the UC library and the faculty in the service of enhancing faculty control over the publishing and dissemination of their scholarly work.  The eScholarship Editions, on the other hand, is one manifestation of a partnership between the CDL and the University of California Press, meant to extend the Press’s publishing capabilities by generating new publishing models (and new organizational configurations to support these models).
To focus first on the eScholarship Repository:  It is, at the moment our most immediate and targeted response to the shifting model of scholarly publishing.  It offers a full-spectrum publishing platform that includes, as you can see, preprints and reports, peer-reviewed articles, edited volumes and peer-reviewed journals.  It exploits an already established University structure by granting research units and departments access to its publishing tools – thus distributing the editorial and administrative functions to those entities who seek to publish within it.  And its really catching on.  Over 200 UC academic units and departments are currently using the repository.  It now holds more than 11,000 papers and boasts 61,000 full-text downloads per week.  Its total downloads since its inception in April 2002 – a sizeable 2.9 million.  (discuss graph)  The eScholarship Repository publishes its documents in PDF and is built upon the Berkeley Electronic Press (or bepress) commercial publishing platform.
eScholarship Editions is a different kind of publishing platform altogether.  First of all, it supports monographic or book-length publication exclusively.  And these monographs come to us via the University Press, rather than directly from the author or the academic department. The primary goal of the eScholarship Editions is to enhance UC Presss editorial and technical publishing capacity both in terms of extending its publishing platforms and reconfiguring its editorial work flow. We are also interested in the promise XML holds for the user as a more flexible and robust display technology.  The eScholarship Editions, as TEI-compliant XML texts, are built upon CDLs existing structured text infrastructure (or XTF) platform, a techology that allows for a finer granularity of search results and textual display. 
Our eScholarship XML publishing infrastructure offers the following functionalities.  The ability to:
*configure book display and branding by collection
*show or hide the table of contents in a frame next to any book section
* choose from several "large print" versions
* search an individual book or across a collection or set of collections; further ability to follow the search down into a specific book, with term highlighting
* control access based on IP address ranges
While we have been delighted with the success of both of these publishing platforms, each has its own technological hurdles, sometimes even walls, that we must clear in order to keep moving forward with the development of these services.  For the Repository, the hurdles lie in the fact that the technology is optimized for journal and journal article publication.  A few of our distributed monographic series (e.g. UC International and Area Studies) publish monongraphs within the Repository and report that is at times an awkward fit – seemingly more comfortable producing documents at the level of the chapter than the book.  In addition, bepress currently supports PDF publishing exclusively, necessitating a bifurcation between the technologies of our two publishing platforms. The challenge for eScholarship Editions, on the other hand, resides precisely in the fact that it is an XML publishing platform.  Our first hurdle:  how to translate the Press’s post-compositor manuscripts into XML-tagged files.  The solution thus far has been to send the manuscripts off to India to be rekeyed and tagged:  an expensive and logistically difficult endeavor.  Alternatives?  Developing a system for producing, as part of the editorial process, XML-tagged files that can serve as master files for any kind of publishing the Press desires for its content.  Though the Press has shown strong interest in this emerging technology, we have yet to identify a reliable way for extracting XML from their editorial practice without disrupting it rather significantly. 
And such potentially dramatic shifts in editorial practice necessarily beg the question [next slide]:  Why XML?
Read slide.  These are questions we don’t yet have answers to…
To return, then, to our current Twain project:  In the realm of the scholarly or critical edition of a literary work, XML seems to need no justification:  it is the perfect marriage of form and function.  XML is an ideal technology for displaying and organizing text at the level of the sentence, the word, even the bibliographic note.  It makes manifest the kind of fine attention to textual detail that is the hallmark of the critical edition.  It also enables the kind of integrated search and display functionality that could render these critical editions even richer and more suggestive as research tools.
As an archive and a scholarly editorial project, the Mark Twain Project and Papers at UC Berkeley’s Bancroft Library is in the rare position of being able to provide immediate access to both primary and secondary sources.  The Papers are keen to develop an online system for integrating these two functions – thus creating critical editions of Twain’s writings that allow for a level of interactivity and integrated reading experience that is unimaginable in a print edition.  At its most basic level, the digital Project will fold Mark Twain’s private papers, comprehensive historical annotation and complete critical apparatuses into the texts themselves. The project aims to provide unprecedented digital access to critical editions of all of Mark Twain’s writings, starting with the works and papers that have already been published by the UC Press in print form and with thousands of additional letters by Mark Twain never before made available in a single source.
But, you might ask, what has prompted the Mark Twain Papers and Project to enter the brave new world of the digital critical edition after 35 years of sustained print publication?  Two things in particular stand out.
The first is a matter of copyright.  The Mark Twain Foundation has granted the University of California Regents exclusive rights to the digital publication of Mark Twain’s writings for a ten-year period (2002-2012).  The Mark Twain Papers and Project and the UC Press are keen to capitalize on this agreement and, if possible, extend it.  In addition, the Papers and Project have been the recipient of several major NEH grants specifically designated to fund the digital rather than print publication of critical editions of Twain. Most recently, a major software company has offered to fund the scanning of all of Mark Twain’s works in exchange for use of the content, and that contract is pending.  The resultant facsimiles will be incorporated into the web site as well.
As you might imagine, the technological demands of mounting a scholarly digital project like this often exceed the capacity of those who work in the world of the critical edition.  Too frequently, critical editions exist as a labor of love rather than a rationalized process with institutional or external support.  Our goal with the Mark Twain Project is to build a prototype - both technological and institutional - for the development of scalable and sustainable digital scholarly editions of literary works.  Our best hope for achieving that goal resides in the collaborative model we’ve established among the CDL, the Mark Twain Papers and Project and the UC Press.
May not be easy to read…
This is a simple diagram to give you a sense of the core technologies underlying the Mark Twain Project.  As you can see, the primary components are the web site, the eXtensivel Text Framework, the Database and the Database Client.  Editors at the MTP (will) use the client to enter into the unified (most likely MySQL) database descriptive metadata about various works as well as any associated bibliographic, source, digital surrogate, or biographical material cross-referenced from the TEI texts of those works. That metadata about the objects will then be indexed and queried either as objects using services from the CDL Common Framework or with more direct lookups using Saxon SQL extensions – we’re not sure yet which, hence the dotted line. The TEI documents themselves will be handled by the CDL’s eXtensible Text Framework (XTF), a flexible indexing and query tool that supports searching across collections of heterogeneous data and presents results in a highly configurable manner.
Should all go as planned, we expect to launch the first version of this web site in March 2007.  The site’s first content will include:
A Connecticut Yankee in King Arthur’s Court
Huck Finn
Roughing It
Letters through 1880
(All will be published as critical editions with apparatus)
RE: A Connecticut Yankee in King Arthur’s Court
What I like about this quote from Twain is that it suggests a model of the book as a palimpsest – as a collapsing of what is written, what has been left out, and what can never be said.  And it recognizes the latent multiplicity of a text – the way in which letting loose those suppressed versions, those restrictive choices would necessitate a library to house the textual proliferation that would ensue.  That’s precisely what we’re doing here, I would argue.  We’re providing a digital library to enable just that kind of fascinating proliferation at the level of the text and of its critical apparatus.  We hope researchers will find it indispensible … we suspect Mark Twain would approve.