Through the New York Public Library's participation in the Mellon Electronic Journal Archiving Program, the Library was able to conduct a detailed investigation into the issues related to establishing a secure repository for archived electronic resources in the performing arts.
The project gave the Library the opportunity to gain a thorough knowledge of the landscape of electronic publishing in music, theater, dance, and film, and it also allowed the Library to investigate the special issues that must be addressed when planning for the long-term preservation of information in electronic format. Electronic content that has been the focus of archival studies and archival projects — including work by other libraries participating in the Mellon Electronic Journal Archiving Program — has mainly consisted of lengthy, highly structured, and professionally produced journal runs that are the product of major publishers, most typically in scientific, technical, and medical (STM) fields. Among electronic resources in the performing arts, however, few examples can be found that fit this profile. Instead, these resources are most typically produced by publishers — individuals or small groups of like-minded people — with few financial resources who produce only a single title as a labor of love. Consequently, the Library took a broad view of the term "electronic journal" for its project, although it concentrated on resources that were "journal-like," that is, resources that are produced in a serial fashion, containing content of interest to sophisticated research by professionals and scholars.
Among the project's substantial contributions were the identification of a large number of such resources currently available which will be of special interest to the field of the performing arts, and the responses of e-publishers to a survey regarding electronic preservation issues. Another major contribution will be of interest both in and outside the field: the results of the Library's investigations into methods for gathering electronic content in a systematic fashion with the purpose of building and maintaining the archive. The issues raised here will be of interest to librarians, publishers, and others concerned with preserving electronic information that is "off the beaten path." Created without the backing of major publishers or academic institutions, this is information produced outside of traditional major channels of publication and distribution, the new "gray literature."
Ultimately, the Library decided not to submit a second-stage implementation proposal to the Mellon Foundation, although the Library will continue to explore some more limited preservation efforts within the framework of a collaborative project led by Stanford University. The following report gives details on the project's analysis of the landscape of performing arts electronic resources, work on content development and implementation planning, and the strategic thinking that went into the decision not to proceed at this time with an implementation effort that builds directly on the results of the planning project reported on here.
Research libraries are concerned to a great degree with preservation. Today, this concern extends not only to the preservation of the manuscripts, books, periodicals, films, recordings, and other materials that line their shelves, but also to preservation of their intellectual content. An archival manuscript, for example, comprises the text and the physical artifact, and both are valuable research resources. Historically, library preservation has extended to the physical conservation of archival collections, the preservation of topical information such as newspapers and journals in the form of microfilm and microfiche, and the protection of degradable materials through appropriate environmental controls. However, the increasing production of information in electronic form has opened up new avenues of exploration in the area of archival preservation. Major research institutions such as the New York Public Library as well as electronic publishers themselves now face the added challenge of ensuring that electronic scholarly journals and publications collected by libraries will be accessible to future generations of readers and scholars.
To this end, the New York Public Library, in response to the Andrew W. Mellon Foundation's invitation for participation in the Mellon Electronic Journal Archiving Program, undertook a planning project that focused on archiving electronic journals in the performing arts to address the long-term preservation of these materials.
The New York Public Library has, from its very beginnings, placed a high priority on safeguarding all its collections for the future, establishing one of the first preservation programs in a research library. Today, the Library hosts one of the nation's largest such programs and works actively together with other leading institutions on addressing important issues related to the preservation of library materials.
The Library has also shown strong leadership in the application of digital technology through a highly sophisticated digital library program now in development that will make hundreds of thousands of materials from its research collections available on the Internet. As part of this program, the Library has given special attention to the establishment of systems, policies, and procedures for archiving information in electronic form.
The choice to focus on the domain of performing arts was made for two very sound reasons:
First, the Foundation's invitation to participate in the Electronic Journal Archiving Program caused the Library to think in new ways about future readership of scholarly electronic materials in subject collections that are special strengths for the Library, such as the dance, music, and recorded sound collections at the New York Public Library for the Performing Arts. This facility serves a broad constituency of hundreds of thousands of annual users — dancers, musicians, actors, playwrights, conductors, choreographers, stage directors, critics, historians, teachers, students, and people from all walks of life — and has become an unparalleled resource for information in the performing arts. While many research libraries have overlapping electronic collections, especially in the realm of science, technology, and medicine, and a reader is able to access information from a variety of services, the Library for the Performing Arts is focused on providing subject-specific materials that are not widely collected or widely available through a single resource.
Second, the Mellon Electronic Journal Archiving Program emphasized not only "the issues relating to electronic scholarly journals" but also "the likely loss to future generations of scholars of material published uniquely in the electronic medium." For the librarians, archivists, and curators who grapple daily with the challenge of format diversity in written, printed, and recorded materials, the Foundation's focus on the electronic medium resonated with concerns about the preservation of non-print materials — which make up a major portion of the collections in the performing arts — as well as issues regarding electronically-rendered versions of print materials. More importantly, the Foundation's project spoke to the developing concern, especially in performing arts studies, about the preservation of publications found only in electronic format, which are at significant risk.
Performing arts studies actually offer a relatively small range of scholarly journals within the confines of the printed form, if one means by "scholarly" refereed journals issued by learned societies or through established publishers. What is starting to become more prevalent, however, is an interesting universe of material made available that takes advantage of the multimedia opportunities afforded by the World Wide Web, opportunities that have been very attractive both because they appeal to the sense of creativity of those involved in the performing arts and because of the relative ease with which publishing enterprises on the Web can be launched.
What can now be found on the Web in the performing arts ranges from very well-produced, highly structured, and highly specialized magazines, to informal tabloid fan-zines full of unedited commentaries, original compositions, and performance reviews. Some are produced under the auspices of traditional publishers and others are produced independently. Rigor aside, all of this is of tremendous importance to scholars and researchers of the performing arts in assessing the impact of artists and the creative enterprise on the wider society. Not surprisingly, the Library for the Performing Arts has collected, and continues to collect, this sort of material very extensively, both in electronic form and in print.
Within the Mellon Electronic Journal Archiving Program, the New York Public Library's focus on the performing arts provided a contrast to the projects of the other participants which focused primarily on electronic journals in the fields of science, technology, and medicine. In its investigations, the Library determined that there were significant differences on many levels between e-journals in these fields and electronic resources in the performing arts.
The major issues investigated by the project can be divided into two realms: content development and implementation planning for a new electronic archive.
The Library's first objective was to identify the publishers of electronic journals and related resources in the performing arts and prioritize them in terms of their research value. Building on earlier, preliminary work in preparation for the project and ongoing work to identify such resources by the staff of the Library for the Performing Arts, the Library was able to identify a significant number of performing arts titles. The Library also began investigating intellectual property issues and the development of formal agreements with electronic publishers to cover the respective rights and responsibilities of both parties in developing a digital archive. An investigation of the potential growth of the content of the archive was also undertaken.
Concurrently, the Library was able to investigate the wide range of technical issues involving system design, source and method of content delivery, and hardware and software requirements in its implementation planning for the archive. Additionally, the Library considered potential organizational models and staffing requirements, access policies, and long-term funding options. The long-term viability of the archive was also considered by examining methodologies to validate the archival processes from a technical perspective, and by exploring the means to assure user communities that electronic resources would be accessible and readable into the future.
The Library appointed as the Project Officer and Principal Investigator Jennifer Krueger, who formerly served as Assistant Director for Electronic Resources at the Science, Industry, and Business Library of the New York Public Library. Ms. Krueger carried out her responsibilities beginning April 2001 and continuing through January 2002, and was assisted by Barbara Taranto. As the Director of the New York Public Library's Digital Library Program, Ms. Taranto provided general oversight of the project and carried out the project's completion through June 2002, in addition to taking a leading role in investigating implementation planning for the archive. Ms. Taranto was appointed the Digital Library Program Director in February 2001 after previously serving as Systems Coordinator for The Research Libraries of the New York Public Library. Prior to this, she worked as a systems specialist at Mount Sinai/NYU Health Center which gave her extensive experience with medical informatics and the long-term preservation of diagnostic imaging. Subject expertise in dance, film, music, and theater was provided by the curatorial staff of the New York Public Library for the Performing Arts. Additional input was provided by members of the Library's information technology staff and also by Dr. Clifford A. Lynch, Executive Director of the Coalition for Networked Information (CNI), who served as a consultant on this project.
Ms. Krueger, with the assistance of the others mentioned above, conducted extensive work in the area of content development. This included an analysis of the performing arts literature in electronic form, the identification of individual resources for consideration, recommendations for criteria for inclusion, investigation into intellectual property issues, and communication with publishers and legal counsel. Ms. Krueger also investigated work completed by other organizations regarding the establishment of digital archives in terms of content and implementing technology. This included, for example, the minimum criteria established for digital archives by the Council on Library and Information Resources (CLIR) and the Digital Library Federation (DLF),[1] and electronic archival implementation done by the European Union-funded Networked European Deposit Library (NEDLIB) (http://www.kb.nl/coop/nedlib/).
Ms. Taranto conducted further analysis of the technological implementation of the archive. This included the establishment of the means of gathering the content (the "ingest" methodology), codifying the content so it would be readily retrievable, setting storage and retention policies, and developing a delivery strategy. Ms. Taranto conducted a detailed investigation into electronic archive modeling and implementation done by other organizations, such as the Reference Model for an Open Archival Information System (OAIS) (http://wwwclassic.ccsds.org/documents/pdf/CCSDS-650.0-B-1.pdf), and other work sited in the section on "Implementation" below. Ms. Taranto worked closely with other participants in the Mellon Electronic Journal Archiving Program regarding technical implementation issues. Both Ms. Krueger and Ms. Taranto conducted investigations into the financial requirements of supporting the implementation of the archive in terms of ongoing content and technology development. Both also worked closely with other participants in the Mellon Electronic Journal Archiving Program, including Stanford University and the other institutions working collaboratively on the implementation phase of the program in the LOCKSS project.[2]
In addition to support for Library staff assigned to the project, funding from the Andrew W. Mellon Foundation provided support for Dr. Lynch and other consultants as well as travel directly related to the project, including site visits to institutions involved in electronic archiving. In addition, the Foundation's support allowed for the purchase of a server that will be used for archiving electronic resources on dance in the collaborative LOCKSS project.
The vast majority of information found in electronic form on the performing arts does not take the shape of peer-reviewed publications and is not the output of scholarly associations or institutions. Instead, this information, for the most part, takes the form of single publications produced by single publishers. The intellectual meat of these publications is not stored as marked-up text, indexed and retrievable through a content management system. Neither are there likely to be persistent style sheets, document type definitions (DTDs), or schemas for storing and rendering output. For these reasons, the scope of the New York Public Library project was somewhat different than the scope of other projects in the Mellon Electronic Journal Archiving Program that were either publisher-based or subject-based, where the content was produced out of large publishing houses.
Although the Library did considerable preliminary work in advance of the project in preparing its original proposal to the Mellon Foundation, much less was known at that time about the characteristics of the electronic publishing base in the performing arts. In contrast, e-journals in science, technology, and medicine have been much more widely studied. The Library's survey of the landscape is a significant contribution of the project.
The scope of the research for the project was determined by the limitations, both financial and technical, of the publishers of performing arts content and their chosen venues of publication. Unlike the large houses such as Wiley and Elsevier, the domain of performing arts publishers is rather narrow. Electronic resources tend to be created, rendered, and stored in a single system, often involving a service provider that is not part of the publishing organization and may or may not share information about its digital architecture to subscribers of the service.
Consequently, a major part of the work plan for the project involved analysis of appropriate candidates for long-term archival commitments. It was anticipated at the outset that reaching agreements with these various publishers would be the most protracted piece of the work.
At the outset, it was clear the audience that was aware of significant electronic resources in the performing arts was much narrower than the audience that is aware of STM publications and other areas of academic and scholarly interest that enjoy a wide dissemination. Partly, this is due to the fact that unlike more traditional scholarly publications in print and electronic form, performing arts publications are not routinely repackaged by aggregators or indexed in any commercially available resource. Consequently, these publications are known and promoted solely on the strength of their dedicated but limited readership and on the mixed professional/commercial content of the venue.[3] In fact, the commercial/professional mix of performing arts electronic resources is possibly the most salient feature of these publications. It affects every aspect of their creation, their delivery format, and most importantly, their viability.
As mentioned above, the Library elected to use an expansive definition of the term "electronic journal," considering many intellectually significant resources in electronic form that do not fit the strict profile of e-journals in science, technology, or medicine. Still, the Library restricted its study to electronic resources that were "journal-like," that is, resources produced in a serial fashion containing content of interest for serious research by professionals and scholars. Other significant electronic resources that are not "journal-like" such as collaborative performance Web sites are very useful to sophisticated research. In fact, one of the Library's most highly prized resources is its Theatre on Film and Tape Archive which has amassed a large collection of videotapes of significant professional theatrical productions, the only complete documentation of many important works extant, including major Broadway and other commercial productions. Archiving content found in Web sites about these productions might be something the Library could consider for the future, but since these resources did not fall appropriately within the mission of the Mellon Electronic Journal Archiving Program, they were not included as part of the project. Likewise, the Library did not include Webcasts featuring the work of performing artists.
The staff of the New York Public Library's Library for the Performing Arts has, over the course of many years, developed its own highly prized indices of various resources in the field. As a result, with the growing use of the Internet as a means of quickly and easily publishing valuable resources, the Library for the Performing Arts began to provide links to external online resources on its public Web site ( http://www.nypl.org/research/lpa/online.html). The effect of this was two-fold: new and important information was made available to the public, and new and important information was brought to the attention of the Library staff by their Web readership. Professionals, researchers, and publishers of serious performing arts journals solicited the Library's interest in the new venue. The Library began to investigate, evaluate, and ultimately propagate certain trusted publications in the performing arts community. This accumulated index of invaluable print and electronic publications is one of the richest resources of the Library for the Performing Arts. It represents years of research, study, and consideration on the part of the Library's professional staff and its governing agencies. The various indices are available in toto in the various divisions of the Library; a subset of these are available on the Web. These seasoned lists formed an important starting point for the project.
A relational database was created for the project to log the entries and to record specific information about each of the publications. Site URL's were examined for "freshness," and live sites were recorded and entered into the database. A brief description of the content was included in the database and special note was made regarding the number of pages deep into the site a visitor needed to go in order to get to the meat of the content. It is important to note that the substance of performing arts electronic resources can often be buried deep beneath layers of advertisements, job postings, auditions, professional service listings, etc. Finding the content was a significant part of the discovery process.
Once recorded, the entries were analyzed and sorted into three primary categories: sites that were content-light; sites with significant content that needed to be mined by readers; and sites where the content was transparent, that is, where the content was not buried several layers deep within a Web site but could be readily uncovered. The first category was eliminated from further inquiry since, for the most part, the information was contemporary in nature, including vendor information, instruction, workshops, etc. and was only relevant for current use. Consequently, despite its usefulness to the current performing arts community, it held little appeal or use for future researchers and scholars.
The set of e-journals that formed the core of the content considered for the project reflected a variety of presentation formats and content organization across the disciplines of dance, film, music, and theater. The list of titles compiled and Web addresses are included in Appendix B.
The publications under consideration were separable into four basic categories:
- Independents/Self Publishers/Web-only Publishers
Examples:
Consumable Online (http://www.westnet.com/consumable), published by Bob Gajarsky, Editor-In-Chief
Ape Culture ( http://www.apeculture.com), published by Julie Wiskirchen and Mary Elizabeth Ladd
- University and Scholarly Presses
Examples:
TDR: The Drama Review ( http://muse.jhu.edu/journals/the_drama_review), MIT Press
The Journal of Seventeenth-Century Music (JSCM) (http://sscm-jscm.press.uiuc.edu/jscm/v1/no1/editors_note.html), published by The Society for Seventeenth-Century Music, University of Illinois Press
- Commercial Publishers
Examples:
Backstage.com ( http://www.backstage.com), published by VNU eMedia, Inc.
Down Beat.com (http://www.downbeat.com), published by Maher Publications
- International Publishers
Examples:
neue muskzeitung (nmz) (http://www.nmz.de/index2.html) and das ist aktlos (http://www.nmz.de/taktlos), produced by Neue Musikzeitung und Autoren (Germany)
Dancing Times ( http://www.dancing-times.co.uk), published by The Dancing Times Ltd. (UK)
Inclusion criteria
Not every Web publication in the area of the performing arts may be appropriate for inclusion in an archive. The Reference Model for an Open Archival Information System (OAIS), the set of organizational principles adopted in the project by the Library, requires that a statement of policy or, at the very least, a set of inclusion criteria be established in order for the archive to be built in any sustainable mode. Establishing a set of criteria based on this model is most often done by an advisory board consisting of scholars, professionals, representatives of arts organizations, and librarians from the user community. This mechanism serves double duty. It provides a solid network of individuals who help shape and review selection criteria and arbitrate on issues when necessary. It also guarantees a level of "buy in" from the stakeholder community, an essential component of all large enterprises and one that is sometimes undervalued.
To reach the stage of establishing a review board for the planned archive, the project team compiled a working list of titles for inclusion. This refined list was drawn from the original titles that were culled from the Library for Performing Arts's listings of online and paper resources. These resources fit the following criteria:
- They were consistent with the current collection development policies of the Library
- They had identifiable publishers
- They consisted primarily of original content
- They were persistent in terms of publishing schedule and format
- They were media rich
Each of the electronic publications that were selected had to contain the first four of these qualities and a strong emphasis was given to those that met the final criterion.
Although certain other criteria such as a publication's importance in the field and its recognized authority or longevity were desirable, it was determined that including these criteria would rule out candidates that were not well-established but were worthy of consideration nevertheless. This is not to say that these attributes counted against inclusion, but they were not considered necessary for inclusion. Some of the titles under consideration had ceased to publish on the Web, or anywhere else for that matter. The abiding interest in these publications was their obvious status of being at imminent risk of being lost for future research.
Consistency with the current collection development policies of the Library
The subject area of the performing arts was chosen as the focus of the project in part because of the collection strengths of the Library and in part because of the concentrated wealth of knowledge found among the professional staff at the Library for the Performing Arts. For the project, this staff drew from their expertise about the nature and long-term stewardship of collections and helped evaluate potential electronic resources with regard to the Library's collection development policy. By extension, the electronic archiving project was the natural and inevitable next step for the Library to make in its long-term strategy to conserve and preserve its materials, and it made sense that the content of the titles nominated were well within the bounds of the Library's current collection practices. The staff evaluated titles individually, and resources such as Critical Musicology (http://www.leeds.ac.uk/music/info/critmus/), for example, which fell within the policy, were identified as candidates for preservation and subject to further evaluation, whereas Web publications such as "CDNOW: Allstar News" (http://www.cdnow.com/cgi-bin/mserver/pagename=/RP/ALLSTAR/main.html), which is primarily a commercial site with no content other than an inventory of products, were not considered further.
Identifiable publisher
One of the many challenges of dealing with small publications, and especially publications on the Web where the means of production can be entirely in the hands of a single operator, is the problem of identifying and locating the person, persons, or agencies responsible for a publication, an important matter in terms of intellectual property and rights aggregation for rights clearance. Simply finding a corporate or personal name claiming to be the publisher of a site is no guarantee that the agent identified has any legal standing in regard to its content. Many Web publications, especially those in the performing arts, "crib" material from other sites (see beat thief at http://www.beatthief.com), or are complex sites that welcome unvetted participation from their readership (see oobr: the off-off-Broadway review at http://www.oobr.com). It is not so much that the publisher does not control the content or disavows the content, but that the publisher may not know, at any given time, the nature of the content, or may not be completely responsible for or capable of content management.
The ad hoc practices and behaviors of some electronic resources in the performing arts may be their most salient and attractive features, but the problems presented by these practices make it close to impossible to "collect" the titles for an archive in any meaningful way. While an agreement might be made with one party involved in the publication, another party may not be reachable or may object to the arrangement. In some cases, where a single agent can be identified, it might very well be that he or she has no clear right to publish the content. This is especially true of electronic resources that provide a substantial amount of streamed media, such as music or film clips. While the text of the site may be the intellectual property of the author/publisher, the media illustration often is not.
Primarily original content
It was considered essential that electronic content consist of original information generated by the publishing source and not information that was repackaged from some other source. The legal ambiguities are considerable for republished digital content generated somewhere other than its primary source which may take on a different format[4] because of bandwidth limitations and service provider restrictions, even if materials are "born digital" (see Soundout at http://www.soundout.net). Print publishers, media companies, film studios, and others all have a stake in how evolving copyright legislation in an electronic environment is drawn and enacted. Until such legislation can address some important issues, what amounts to "buying or licensing" digital rights is unclear at the very best and risky at its worst. Consequently, electronic resources that required extensive legal and monetary negotiations regarding intellectual property were not considered for the project.
Copyright issues are not necessarily insurmountable. However, work in obtaining legal rights to content in many performing arts electronic resources has the potential to turn into a legal quagmire. Even if the content is highly desirable, the cost of doing due diligence on the variety of conditions under which the content might be archived and delivered could far outweigh the benefit gained by preserving the material. Dealing with purveyors who have clear title to materials or at least could indemnify the Library from any liability where rights have not been cleared was considered absolutely necessary.
Publishing persistence
The initial list of performing arts titles consisted of a very wide range of e-publications including many that had substantial content but would generate, with more than occasional frequency, the disappointing message "unavailable." To be fair, this was more likely to be the fault of Internet service providers rather than of publishers: in the current economy service providers have been known to retire with little or no notice, leaving their customers in difficult straits. Still, publications that could not be accessed with some consistency were excluded from consideration.
Planning to archive any serially published materials is most readily done through the establishment of close relationships with the publishers of the content. There is obviously an extensive lead-time required to set up the legal and technical parameters for deposit, and some of the proposed sites failed to conform to any manageable or predictable schedules of publication.
Other publications in the course of a few months continuously reinvented their sites, changing basic organizational formats,[5] intellectual direction, and targeted audience. Still others discontinued publication even though it was clear from the "hit" counter that the site was still actively used. Overcoming the issues raised by dynamic content is a technical challenge that will be addressed later in this report, but in place of either persistent schedule or persistent intellectual format, the Library ruled out these publications for inclusion in the archive.
Media rich
The project's focus on the performing arts provided the potential to explore the issues raised by electronic publications with embedded multimedia objects. As noted, nearly all the titles under consideration contained some form of non-text material. The amount of audio-visual media included was significant, but lower than anticipated: 45 percent of the titles contained sound and/or video formats. Some publications, such as African Music Archive (http://ntama.uni-mainz.de/~ama) and Ethnomusicology Research Digest (http://www.inform.umd.edu/EdRes/ReadingRoom/Newsletters/EthnoMusicology), had a wide range of content including archived audio, streamed audio, and music performance. African Music Archive, however, offered its readership MIME types and browser-compatible formats, while Ethnomusicology Research Digest relied on the ingenuity of its user base to download and then reformat binary objects into human "readable" files. The probability of successfully archiving standardized formats such as Real Media or QuickTime is much greater than that of archiving idiosyncratic media types. The ingenuity of the site publisher is manageable with human intervention, but daunting when planning an archive based on automated systems for ingest, storage, and retrieval.
The vast majority of e-publications containing media objects did not venture into unusual file types or file types unique to the performing arts, such as MIDI-based musical material or computer-based dance notation, which would represent specialized and complex preservation issues or areas of uncharted technology involving "new media." Here, the term "new media" refers to new types of electronic formats, not new types of performance (see Music by Light at http://www.itp.nyu.edu/GALLERY/music_light.html). For certain types of performing arts sites that cannot be easily classified as e-journals or e-zines, there may be a potential audience for such new media, but for the purposes of the project these titles or sites were not considered for inclusion in the archive.[6]
Almost all of the titles that were under review included some form of non-text content. Approximately 85 percent of the electronic resources contained images and graphics beyond that which could be described as organizational logos or publication mastheads, and these resources clearly presented issues for concern regarding intellectual property rights. In the area of music and recorded sound, a further 45 percent of the listed titles contained various forms of audio media. Some of this content was available as MP3 files, some as QuickTime. Many sites that were considered, such as das ist taktlos (http://www.nmz.de/taktlos/index.shtml), provided numerous streaming excerpts from radio broadcasts, live performances, and discussions with musicians, composers and reviewers. Aside from the copyright issues involved, which may or may not be covered by formalized waivers given the legal standing of the publisher and the publisher's ability to clear the rights with all parties involved, the technical challenges this presents were not insignificant.[7] Streamed media, whether it is audio, video or Webcast, presents special considerations for archiving that have not been much explored. These formats, including Real Media files or QuickTime files, in fact constitute a third-, fourth-, or sometimes even fifth-generation derivative of a digital source. In many cases, the digital source is itself a reformat of an analog recording. That aside, streamed media is not delivered as a "unit." It is pieced out across the telecommunication pipeline in sizeable, downloadable chunks. Streamed media requires browser plug-ins and local transmission-speed configuration files in order to be rendered properly on the desktop. Unlike other binary objects that can be harvested directly and stored in an archive, streamed media published on the Web is not so much a digital object as an event. It certainly can be reproduced locally but it cannot be easily harvested. In addition, some sites offer multiple streams of the same material at different levels of quality (where higher quality streams would be selected for recipient sites with higher bandwidth connections to the Net), raising additional considerations.
Despite the challenges that media-rich resources present, it was felt necessary to give some priority to these resources in identifying potential content for the archive, specifically in order to address such challenges.
Further refining the subject focus
At a certain point in the project, as the number of potential electronic resources grew while, at the same time, cost projections for maintaining the archive were being developed, it appeared necessary to narrow the subject focus of the project if there was to be a hope of taking it to an implementation phase. The key strategic consideration used in narrowing the field was to emphasize congruence with existing programmatic commitments at the Library.
Of the titles that remained after the process of elimination based on the five criteria noted above, the majority fell into one of two areas: music and dance. The music titles were by far more electronically sophisticated than most of the dance journals, but several raised significant rights issues and the likelihood of coming to mutually agreeable terms for access with publishers and artists was not encouraging. Furthermore, the music resources, while more established and containing more content than some other publications, did not complement the subject emphasis of the specialized collections of the Music Division at the Library for the Performing Arts.
On the other hand, the thirty-some dance titles that survived the initial cut were more in line with the goals the Library had set for itself at the onset of the project. The dance titles, which are listed in Appendix C, can be characterized in the following ways:
- Journals were published both nationally and internationally
- Most types of dance and movement performance were represented
- Publications were split roughly between "born-digital" editions and "digital facsimiles" of hard copies
- Publishers were identifiable and locatable
- Media clips, both sound and video, were found in many of the publications
- New content, for the most part, was still published as "issues"
- Most publications were currently offering past issues that have been "archived" on their respective sites
Another factor that was taken into consideration was the comparatively limited number of institutions and organizations focused specifically on dance research and scholarship, the New York Public Library for the Performing Arts being one of them. The Jerome Robbins Dance Division of the New York Public Library for the Performing Arts is highly respected both nationally and internationally and is considered to be one of the most reliable and rich repositories of materials on dance anywhere in the world. The Library is a founding member of the Dance Heritage Coalition and hosts the Coalition's Web site (http://www.danceheritage.org). Given the leadership role of the Library's Jerome Robbins Dance Division among dance libraries, it is quite likely that the Library might be the only organization that could consider taking on a major role in establishing an electronic archive for dance.
Considering this, the Library was in a good position to leverage some of its existing relationships in the dance community to help solicit participation and solidify commitments from publishers. Given this and the special nature of the dance e-publications, the Library made the decision to focus its efforts "narrow and deep," providing strong depth of coverage in the very specific subject of dance.
By focusing on dance titles the Library believed it could accomplish significant understanding of the process involved in collecting, storing, and delivering electronic content that was not already normalized in some other system. The challenges predicated by the diversity of document types, media types, and publishing genres were well represented by the dance titles selected.
While in a perfect world it would be best to be able to archive as much electronic content as possible, it was felt that developing an archive containing a modest number of electronic resources on dance would make a contribution to both the dance field and electronic archiving. The challenges for archiving this material are obviously quite different from the challenges of archiving large numbers of regularly generated text files from content management systems, but given the range of participants involved in the Mellon Electronic Journal Archiving Program, it appeared that certain issues of scale were going to be addressed by other parties, and that the Library would make its most significant contribution by exploring areas not particularly relevant to the STM-type archive.
The success or failure of an archive depends in large part on the good will and cooperation received from the publishing community. For obvious reasons, the archive would be content-less without the publishers' material, but more importantly, without the good will of the parties involved there are no grounds for negotiating or resolving issues as they arise.
This was a lesson learned and generously shared with the participants in the Mellon Electronic Journal Archiving Program by the PubMed Central team at the National Institute of Health in Bethesda, Maryland. While PubMed's experience was most obviously applicable to the Mellon program participants working in science, technology, and medicine, this experience was also applicable to the performing arts and other fields.
Dr. David Lipman, Director of the National Center for Biotechnology Information at the National Library of Medicine, and his programming team shared the various technical challenges involved in ingesting material from a wide variety of scholarly publishers, many of whom are small, single-title entities like those found in the performing arts group. In some cases, months of negotiation were necessary between depositors and PubMed Central before an actual document was submitted to the archive. Most of this time was spent on the development of a document type definition (DTD) for ingest. Some time, however, was lost in trying to work with publishers who were less than enthusiastic, thinking that with enough technical and professional support from the National Library the publishers would be more receptive to the requirements of the archive.
Dr. Lipman reported that after eighteen months, working with a range of titles, they had come to the conclusion that the only viable arrangements were those where the publishers' involvement was entirely voluntary. Trying to win interest in the project could not be had by any technical incentive, and the possibility of providing a financial one was slim.
These lessons were valuable ones for the New York Public Library in dealing with publishers in the performing arts. It should be noted, however, that the PubMed project was operating in a very different milieu than the performing arts, where it is only a small exaggeration to say that publishers cannot even afford backup disks. Performing arts publishers are typically very aware of the need of preserving information, but any reluctance on their part to participate with regard to financial considerations must be cast in a very different light. Such publishers, it was found, were supportive of the development of an archive, and clearly saw the benefits in the possibility that the Library might be able to offer server space and the technological wherewithal to make an archive come about.
For many of the publishers of electronic resources in dance considered by the Library, the Web is the only publishing medium: no print copy exists.[8] Approximately 80 percent of these e-journals and e-zines are currently providing their own online archive, subject to the terms of their Internet provider and the amount of space each can afford to maintain and expand upon with new content. In some fields, as an attempt to underwrite this service, past issues, which are often indexed by issue date, may be searched by registered readers or by paying a fee to the publisher.
Publisher-based archives are far from stable, however, as we have witnessed by the almost overnight disappearance of rather successful electronic publications such as The Friends of Photography (http://www.friendsofphotography.org) and the original Time Digital (continued in a very different form as ON Magazine which now is no longer being published) among others.[9] Unlike some of their more broadly-based cousins who might withstand the loss of an electronic presence, dance e-journals have neither the means nor the wherewithal to assure the public that the online archives will persist.
The short-lived nature of Web editions and the economic realities of producing art publications argue for some degree of receptiveness on the part of the individual publishers. Consequently, when the Library took a straw poll of e-publishers in February 2001, it was not entirely surprised by the positive response. A sample of twelve publishers was selected and each felt their audience would find the archive useful. All but one expressed interest in the development of an electronic archive; the lone dissenting publisher expressed concern, with little explanation, about losing advertisers due the establishment of the archive. Regarding the idea of making the archive freely available, eleven of the twelve responded positively and also felt it would be unnecessary to limit access for any period of time after publication. All but one of the publishers were willing to have content from their Web sites harvested by the Library or another archive administrator, although only half were willing to provide files of Web content directly for storage. Seven of the twelve responded that they allowed their site to be crawled by such resources as Google or the Internet Archive; the others did not know or did not respond. On a question regarding storage, the publishers indicated the file size of their individual publication issues ranged from 7 to 100 megabytes. Ten of the twelve indicated they planned to increase multimedia content although a few noted this might take years.
The following illustration provides a matrix of possible relationships between publishers and the New York Public Library. The publishers' participation ranges from the most active — where the Library receives everything it wants, when it wants, and how it wants, with no cost and with total control over delivery — to the publishers having no participation at all and the Library essentially adopting a "risk management" approach to archiving content. When publishers have no formal agreement and the content is freely available on the Web, it may be tempting to harvest sites until an objection is raised. While this may not be ideal, it is perhaps the only way to handle some of the more elusive candidates. However, this last arrangement seems to be the least tenable and the least desirable. It is somewhat akin to a collection development policy based on taking what you can get and not what you want.