random library quotation Link: Publications Forum Link: About DLF Link: News
Link: Digital Collections Link: Digital Production Link: Digital Preservation Link: Use, users, and user support Link: Build: Digital Library Architectures, Systems, and Tools
photo of books






Please send the DLF Executive Director your comments or suggestions.

4.3 Pathways to E-Learning in Science and Beyond

This section describes the National Science Digital Library (NSDL) and four related scientific digital libraries, alongside a complementary community of practice in e-learning, MERLOT (Multimedia Educational Resource for Learning and Online Teaching). These services are increasingly anchored and sustained by discipline-based entities as they move from a collection-driven approach to an emphasis on pathways to resources and community participation. Taken together, they serve the full spectrum of "K to gray" learners and educators.

Over the past six years, NSDL has distributed an estimated $125 million dollars in funding to more than 200 projects. While the discussion below concentrates primarily on NSDL's function as an aggregator, harvesting digital resources for discovery via a unified search and retrieval interface, it is important to acknowledge from the outset NSDL's leading role in facilitating research collaboration and engaging stakeholders across public, private, university, K-12, and government sectors in strategic planning for the effective delivery of digital services. NSDL serves a crucial function at the national-level by re-thinking digital library architectures (Lagoze et al. 2005), developing and promoting best practices (http://oai-best.comm.nsdl.org/), creating generic tools and service applications (http://nsdl.org/resources_for/library_builders/tools.php), conducting research into user needs (California Digital Library 2004, Hanson and Carlson 2005), and advancing techniques in large-project management and participant involvement (Giersch et al. 2004).

The SMETE Open Federation, launched with NSF NSDL Collection and Core Integration funding, includes among its membership more than forty organizations and digital libraries that share the common purpose of advancing digital libraries in science education. The other services discussed in this section are all members of SMETE. NEEDS, BEN, and DLESE are leaders in their respective communities-engineering, biological sciences, and earth science-in building effective digital library services. Although MERLOT's user community is multi-disciplinary, it is included in this section because of its prominent role in science education. It differs from most of the other services under review in this report in two important ways: (1) it is membership-based organization with a formal dues structure that dictates levels of participation and (2) it does not make its metadata freely available for OAI harvesting. MERLOT is particularly known for its peer-review practices and community-developing strategies.

4.3.1 NSDL: National Science Digital Library

Update Table 14: NSDL based on DLF Survey responses, Fall 2005
The National Science Digital Library (NSDL)
ORGANIZATIONAL MODELNational Science Foundation (NSF)
SUBJECTScience: STEM (science, technology, engineering, mathematics)
FUNCTIONA digital library of exemplary resource collections and services, organized in support of science education.
PRIMARY AUDIENCEK-12 teachers, Librarians, NSDL library builders, University faculty
SIZE1.1 million items (265% growth) from 569 collections of which 48 are NSF-funded NSDL collections. 92% of the item-level records are derived from the top 20 collections.
USEFrom May to September 2005: unique daily visitors jumped from 8,755 to 11,013; page views increased from 30,106 to 50,440 with 4.65 page views per visit (up from 3.87 in May).
ACCOMPLISHMENTS1. Improved search service. M
2. Improving NSDL data repository using FEDORA.
3. Redeveloped NSDL.org web site that: --Allows users to self identify by audience on the homepage in the following categories: K12 Teachers; Librarians; NSDL Library Builders; University Faculty, and; First Time Users. --Features periodically updated exhibits, crafted by section editors, for each audience category including: "Top Picks," "Resources of Interest," "Using NSDL," "Research Articles,"Newsfeeds," and an "Events Calendar." --Provides a one-click connection to browse by science, technology, engineering and mathematics topics from the homepage. Based on user testing and feedback the new web site design places more emphasis on: --Active, interactive engagement of users, Example--"Using NSDL"; --Being externally focused, Example--"Newsfeeds"; --NSDL.org as an educational tool, Example--"Resources of Interest"; --Addressing users' educational needs, Example--"Research Articles"; --What NSDL has/is, Example--"Browse by Topic," and; --What users' want to do or know, Example--"Ask NSDL."
CHALLENGES1. Lack of funding to offer more teacher workshops in how to use NSDL through organizations and school districts to increase usage in schools.
2. Great diversity in evaluation methods and tools across 190+ NSDL digital library projects.
3. Lack of a well-funded corporate and foundation outreach program to diversify sustainability options.
TOOLS OR RESOURCES NEEDEDIncreased funding for the National Science Foundation particularly EHR (Education & Human Resources).
GOALS OF NEXT GENERATION RESOURCEIn order to increase overall NSDL usage and interactive communications through teacher workshops, professional conferences, and other outreach and communications events and activities, user testing results were analyzed in recreating NSDL.org as a useful educational tool that educators and learners in particular would use repeatedly. Leveraging multiple online and face-to-face interactions is a top priority as repeat users become contributors in a timely and transparent way in the next generation of NSDL.

Since its inception in 2000, the National Science Foundation's Directorate for Education and Human Resources (EHR) has made nearly 220 awards totaling more than $125 million dollars to develop the National Science Digital Library (NSDL). [[115]] The four major funding streams are defined as follows:

  • Pathways [replacing "Collections" in FY04] projects are expected to provide stewardship for the content and services needed by major communities of learners.
  • Services projects are expected to develop services that support users, resource collection providers, and the Core Integration effort and that enhance the impact, efficiency, and value of the library.
  • Targeted Research projects are expected to explore specific topics that have immediate applicability to collections, services, and other aspects of the development of the digital library.
  • Core Integration coordinates and manages the core library, develops the library's central portal and infrastructure, and engages and supports the NSDL community.

Table 18: Summary of NSF NSDL Funding FY2000 through FY2005
Proposals Submitted90109156193144120812
Total Dollars Requested (in millions)$59$64$92$110$126.50$83$534.50
Funded Budget (in millions)$13.65$25.13$26.76$22.80$19.22$18.00$125.56
Funded Proposals293955442722216
Collections (FY2000-03)131835220088
Pathways (FY2004 - )00004913
Targeted Research14686227
Core Integration (CI)63333321
Subcontracts (part of CI)04556727

Source: NSDL 2005 annual report; Zia 2001-2005 in D-Lib Magazine & email correspondence, March 29-30, 2006. [[116]]

NSDL's initial emphasis on Collections has shifted over the past two years to configuring and integrating digital resources into sustainable services by anchoring them in established communities of practice thereby "enabling learners to 'connect' or otherwise find pathways to resources appropriate to their needs" (Zia 2006). Collections' funding peaked in 2002 when there were 35 projects, accounting for 68 percent of the total NSDL budget. By 2005, NSDL funding was about equally distributed between Core Integration and the three project tracks, with Services receiving an estimated 28 percent and Pathways, 18 percent of new project funds. To date, Pathways are under development in the biological sciences, physics and astronomy, computational science, middle school teacher resources, materials science, mathematical sciences, engineering, multimedia resources for the classroom and professional development, and resources and services for community and technical colleges. [[117]] In FY06 proposals will be accepted for the Pathways track only or "for supplemental funding from existing projects to extend or enhance their services, collections, or targeted research activity so as to enlarge the user audience for NSDL or improve capability for the user." [[118]] Two projects under review in this report, BEN (BiosciEdNet) and SMETE/NEEDS are exemplars of NSDL Pathways. In addition, DLESE, funded by NSF's Directorate for the Geosciences, serves as an NSDL Earth Science node.

NSDL's 2005 annual report [[119]] identifies five areas where it is concentrating its efforts to improve education, eaer audience for NSDL or improve capability for the user." [[118]] Two projects under review in this report, BEN (BiosciEdNet) and SMETE/NEEDS are exemplars of NSDL Pathways. In addition, DLESE, funded by NSF's Directorate for the Geosciences, serves as an NSDL Earth Science node.

NSDL's 2005 annual report [[119]] identifies five areas where it is concentrating its efforts to improve education, each with representative project case studies:

  • Evaluation, the continuous process of measuring the impact of NSDL activities on learning.
    • Case Studies: Teachers' Domain, The BEN Portal, Kinematics Library
  • Classroom Resources, the nuts-and-bolts work of putting new tools into teachers' hands.
    • Case Studies: Starting Point, TeachEngineering, Instructional Architect
  • Technology, the massive effort to build and grow a hidden grid that holds digital libraries together.
    • Case Studies: FEDORA Holds Everything, AMSER and CWIS, Searching for Math Formulas (Wolframs Functions)
  • Community Building, encouraging learning groups to use the NSDL to pursue their questions.
    • Case Studies: Virtual Math Teams, CHEM Collective, Interactive from SHODOR, Environmental Resources Library
  • Informal Learning, the extension of NSDL resources to libraries, museums, and publications.
    • Case Studies: OCKHAM, Scientific American Online, Exploratorium Online

Collections in NSDL

According to NSDL's online collection policy, "NSDL is a collection of other digital library collections." Collections may consist of a single resource or thousands of resources. All NSDL resources are associated with at least one other external collection in order to associate them with a "responsible organization or project." Collections and resources are selected by NSDL Program-funded Pathways and Collections Projects and by the NSDL Director of Collection Development. In addition, collections and resources are recommended by a team of volunteer recommenders (mostly science librarians), NSDL community members, and also the general public. These recommendations are checked against the selection criteria and approved by the Director of Collection Development for inclusion in the NSDL.

There are two broad selection criteria that are intended to be inclusive in order to allow a spectrum of quality and review:

  • appropriate to fulfilling the mission of NSDL
  • matches the subject scope of NSDL
Users are advised: "NSDL currently contains information about all NSDL funded collection projects, other government funded STEM collections, and other collections associated with universities, private organizations, and companies that fit the subject scope of NSDL." [[120]]

NSDL collections contain freely available and restricted-use resources. When access is limited, the collection should have open access metadata describing the resources.

As of May 2006, there are 660 collections accepted into NSDL of which 121 have item-level records. [[121]] Twenty NSDL collections account for 92 percent of the estimated 1.2 million item records. (Lagoze et al. 2006a report about their experience in harvesting from 114 NSDL collections via OAI; 37 collections come from only eight providers.) The top twenty data providers range in size from arXiv.org with nearly 340,000 items to DLESE with 7,200 items. The four largest collections also figure among the top twenty in OAIster: arXiv.org, the Office of Scientific and Technical Information (OSTI) OAI Repository, CITIDEL (the Computing & Information Technology Interactive Digital Educational Library), and Wolfram Functions. [[122]] Of the 88 collection projects funded via NSDL (representing 64 unique collections) from FY00 to FY03 an estimated 75 percent of them have item-level metadata in NSDL. On balance, NSDL-funded collections represent a very small portion of the NSDL content: CITIDEL with more than 100,000 records, followed in size by DLESE with 7,200 items-the remaining circa 46 NSDL-funded collections have 3,000 or fewer records.

Publisher Partnerships in the NSDL [[123]]

In its work over the past two years, the Core Integration (CI) team has proceeded from the premise that in order for the NSDL to become a resource of choice, used frequently by a broad range of teachers and students on a national scale, it is necessary to engage the interest and participation of the scientific textbook and software publishing community. This community includes both non-profit and for-profit organizations that control a substantial percentage of the high-quality educational science materials currently being produced for teachers and their students.

In this effort, the CI team took steps to engage this community in a collaborative and productive manner, so as to ensure that the NSDL becomes a strong and valued partner rather than a competitor to the traditional science publishing community. Science publishers possess assets that will become critical to the future success of the NSDL, including an efficient and stable mechanism for acquiring and peer-reviewing high quality content from scientists and science teachers; an effective system for editorial development, design, and production of this content; excellent market research and evaluation mechanisms; established models for contracts, licenses, copyright, and intellectual property management; and a reliable system for marketing and sustainability. In addition, many of these publishers work with vendors who provide technical infrastructure and support for schools.

Through its access management and publisher relations efforts, the CI team has established a formal means to engage the science publishing community, including a means to enable controlled access to their content. These activities will ensure that the NSDL reaches its full potential as a functional, valued, and highly used resource, and serves as a model for partnerships with other collaborators in the future.

As of May 2006, NSDL CI has established relationships with 18 science publishers. Many of these have begun to supply metadata for their materials which then appears in the NSDL central portal interface. The publishers include:

  • American Mathematical Society
  • American Physical Society
  • Bedford, Freeman, and Worth
  • BioOne
  • Blackwell Publishing
  • Cambridge University Press (book and journal programs)
  • Elsevier Books
  • Houghton Mifflin Company
  • John Wiley and Sons
  • McGraw-Hill
  • National Academy Press
  • Nature Publishing Group
  • Oxford University Press (book and journal programs)
  • Pearson Education
  • Scientific American
  • Springer Science+Business Media
  • Tom Snyder Productions (software division of Scholastic)
  • Tool Factory (educational software)
Users can browse collections alphabetically by title or by an expandable subject tree (branching out from Education, Health, Mathematics, Science, Social Studies, and Technology). Collections can be also identified through the interactive visual view of "NSDL At a Glance" tool, organized by topics from The Gateway to Educational Materials (GEM) subject scheme.

Source: http://nsdl.org/browse/ataglance/browseBySubject.html (February 2006)

Since the 2003, NSDL has developed access point to its content by audience: K12 Teachers, Librarians, NSDL Community, University Faculty, and First Time Users.

Table 19 summarizes the widely varying results retrieved in a search for resources relevant to University Faculty about Astronomy. Browsing by topic identifies 68 collections relevant to Astronomy. A keyword search retrieves more than 11,000 resources. The University Faculty portal contains one "top pick" relevant to Astronomy. A search of the Virtual Reference Desk, AskNSDL question-and-answer archives and resources (requires registration and log-in) compiled by an NSDL reference desk librarian, locates 12 collections relevant to Astronomy (but the list does not include the Physics and Astronomy Pathway found through the University Faculty portal). In addition to blogs, Web sites and other types of resources, AskNSDL has 68 archived questions from users related to Astronomy.

Table 19: Astronomy Results from Various Portals, Pathways, and Navigational Features
Browse by Topic: 68 resources
Search by Keyword: 11,091 resources
University Faculty Portal
Top Picks: 1 (of 14)
To Physics and Astronomy Education Resources
Through a partnership of authors and organizations ComPADRE acts as a steward for the educational resources used by broad communities in physics and astronomy by creating and sustaining a network of collections that provide learning resources and interactive learning environments. ComPADRE resources positively influence physics and astronomy students and their teachers in both individual and collaborative settings.
Resources of Interest: 0 (of 8)
Using NSDL: 0 (Although at least 2 of 5 resources featured are of potential interest)
1. Sunshine Applet
This Java applet shows sun exposure and intensity for any latitude and longitude, and any date during the current year. The times of most intense and dangerous sunshine are given through a chart and global map, as well as a graph indicating the current location of the Sun in terms of strength. There is also an indication of sunrise, sunset, and Sun culmination.
2. ATHENA Mars Exploration Rovers
Cornell University, NASA's Jet Propulsion Lab, and Bill Nye present information on the Mars Athena Exploration Rovers. Mission updates from Athena Principal Investigator Steve Squyres, technical briefings, Images, at-home experiments for kids and lesson plans compliment details of mission goals and payload.
Research Articles (0 of 7)
News Feeds: 0
Events Calendar: 0
AskNSDL: Home: Science: Astronomy: Resources
Blogs, Feeds, Podcasts: 5 resources
FAQs: 9 resources
Suggested Web Sites: 10 resources
Archived NSDL Scout Reports: 1 resource
NSDL Collections: 12 resources
Core List of Astronomy Books
Exploratorium. Ten Cool Sites: Astronomy
NTRS: NASA Technical Report Server
PhysLINK.com - Reference and Education - Physics, Astronomy and Engineering
SEGway: The Science Education Gateway
Smithsonian Institution
Spaceflight now - The leading source for online space news
The Parallax Project
The Sun-Earth Connection Education Forum
Virtual telescopes in education (VTIE)
Other "Ask an Expert" Archives: 1 resource
Educator Resources (lesson plans): 1 resource
AskNSDL: Home: Science: Astronomy: Archived Questions: 68 Questions & Answers

Source: http://www.nsdl.org/ (February 2006)

Given the diversity of these sample results, users should be encouraged to experiment with different search, browse and navigational functions to see which best suit their needs.

Search Features

In 2003, NSDL offered both simple and advanced search features. Simple search relied on keywords with the ability to limit by Type of Resource (Collections, Items, News, Exhibits, Collections with reviews, Items with reviews) or by Resource Format (Text, Image, Audio, Video, Interactive Resource, Data), whereas advanced searches allowed Boolean commands limited to keyword anywhere, keyword in content, title, author/creator/contributor, subject and format/genre. In response to user feedback, NSDL simplified its approach and now offers a single search box for keywords with the option to limit the search by Resource Format (same as above) or Grade Level (Graduate, College, High school, Intermediate elementary, Middle school, Primary elementary). In spring 2006 NSDL added an option to Search resources (i.e. educational resources) or Search NSDL.org (i.e. NSDL community sites or the NSDL.org site). As of this writing, these labels are under review and additional information describing the options will be added once approved.

Source: http://nsdl.org/search/ (May 2006)

Search Tips explain that searches are not case sensitive and that quotation marks should be used around phrases. Boolean commands are no longer available, nor is there any explanation whether or not there is automatic ANDing of search terms (a common feature of most general search engines) or truncation (or other wildcard functions).

Results are returned ten per page with brief annotations and links to "View all related information" (provides item and collection-level metadata, as available) and "Include/Exclude results like this" (enables filtering by collection). Users can navigate to previous or next pages but cannot sort results or jump to different page results. When the revised Web site went live in late October 2005, a standard feature was added so all NSDL.org pages can be emailed via the "Email this page" link in the footer. However, there are no post-processing features to save or export results by other means. When a search does not produce any results, users are advised to consult the search tips, browse the collections, or check back as NSDL collections continue to grow. In March 2006, NSDL implemented "Did you mean" spelling suggestions. A search for "crystalography," suggests the corrected spelling, "Did you mean," crystallography.

Search Results

When conducted in late January 2006, a sample search for the keyword <crystallography> produced curious results (Table 20). Without deploying any search delimiters, the basic term query returned 633 results. Filtering the results to exclude the collection of the first retrieved item, reduced the result set to 616 resources. When the link "search within this collection" (e.g., DLESE) is used, the results increased dramatically to 7,175. Beginning anew with the search term, <crystallography,> but limiting by grade level, produced a wide range of results, with 20,373 hits at the high school level. [[124]] Given the proviso that not all resources contain format metadata and, therefore, relevant results may be excluded, it was alarming to retrieve much higher and wildly different returns when the format delimiter was invoked-e.g., over 700,000 texts and 34,000 images pertaining to crystallography, when the keyword search retrieves 633 resources. Based on Brogan's query of early February 2006, it became apparent that NSDL was combining all keyword appearances (i.e. through OR operators) rather requiring the presence of both words (i.e. through AND operators). NSDL modified its newly implemented search interface and corrected these errors on the production site in mid-February 2006. The results of the identical search conducted after the modification are dramatically different.

Table 20: Search Results for <crystallography> with and without delimiters
SEARCH QUERYRESULTS: January 31, 2006RESULTS: March 21, 2006
KEYWORD: crystallography
1st Item Retrieved: crystallography
According to the annotation: This site is a link from a mineralogy database hosted by webmineral.com. "View all related information" indicates that this item is derived from the DLESE (Digital Library for Earth Science Education) Collection.
Include/Exclude results like this
Exclude This Collection (e.g., DLESE)
Search within this collection (e.g., DLESE)
SEARCH BY GRADE LEVEL: crystallography
High school
Middle school
Intermediate elementary
Primary elementary
SEARCH BY FORMAT: crystallography
Interactive resource

However, as NSDL officials explain, determining how "boosting and filtering" occurs is not entirely straightforward; in the many cases where the data provider or collection does not provide resource-type information in their metadata, relevant results may be lost from the search and the results are narrow. Even so there are still other problems apparent in this new sample search. The 12 results for Graduate-level resources contain three apparent duplicate references to Reciprocal Net. Users have to link to another screen to find out if all three are from the same source or not. Two seem identical (despite different NSDL OAI identifiers); the third is an article discussing Reciprocal Net that appeared in a NSDL Whiteboard report. Moreover, Reciprocal Net is tagged for three grade levels "graduate, undergraduate, grades 10-12" yet only shows up in the "Graduate" search. In early April 2006, NSDL reinstated the collection icons in search results pages allowing users to see the collection in which the resource resides, which helps to address some of these issues.

NSDL's New Resource-Centric Fedora Architecture [[125]]

NSDL's conversion to a Fedora repository marks a major transition from a metadata-centric to a resource-centric data model and search service. According to NSDL developers:

Digital libraries need to distinguish themselves from web search engines in the manner that they add value to web resources. This added value consists of establishing context around those resources, enriching them with new information and relationships that express the usage patterns and knowledge of the library community. The digital library then becomes a context for information collaboration and accumulation - much more than just a place to find information and access it. (Lagoze et al. 2005)
Finding the metadata-based model inadequate, the developers describe "an information network overlay within Fedora, which includes the full functionality of the existing metadata repository, but models relationships, services, and multiple information types within a web-service based application" (Lagoze et al. 2005). More recently, NSDL principals analyzed the many difficulties they have encountered over several years in relying on metadata to build the NSDL. They provide persuasive evidence for the new "resource-centric architecture that integrates less structured forms of information, which collectively add value and context to digital resources." As they explain:
Traditional structured metadata plays a role in such information contextualization. However, it exists as a component of a resource-centric model, rather than being the focus of the information model itself. (Lagoze et al 2006a, 3)
Their discussion goes beyond metadata quality to investigate other issues that add complexity and cost to operating a large-scale metadata aggregation site like the NSDL. For example, they reveal dismal harvesting statistics, citing an overall success rate of 64 percent and a monthly failure rate of 25 to 50 percent. They attribute harvest failures equally to three broad areas:
  • 1. a communications or system failure either at the data provider's server or with the NSDL's OAI harvester
  • 2. OAI protocol violations
  • 3. invalid XML data, XML schema non-compliance, or SML, URL or UTF-8 charactering encoding (Lagoze et al 2006a, 5)
Resolving harvesting failures entails extensive email communication, estimated at 170 messages per provider per year.

The new architecture is intended to model resources rather than metadata and permit the provision of richer information, including context and less-structured metadata. The infrastructure is also making possible a number of new NSDL applications described by Lagoze and his colleagues:

Sustaining NSDL Collections and Services

Faced with the prospect of diminishing NSF funds, NSDL is increasingly turning its attention to strategies that will sustain its efforts and integrate them into established library services. NSDL's Sustainability Standing Committee Chair, Paul Arthur Berkman outlines four components of NSDL, each requiring its own strategies, if the NSDL is going to survive as a collaborative, coordinated effort where the sum is greater than its parts.

  • Program Sustainability involves strategies to facilitate long-term collaborations among projects, uses, sponsors, federal agencies and other stakeholders that share in the progress of the NSDL.
  • Project Sustainability involves the public-private-university-government strategies to support the creation, maintenance and evolution of collections and services in the NSDL.
  • User-Community Sustainability involves the networking, outreach and engagement strategies that are necessary to grown the community of users, members and sponsors who will support the NSDL into the future.
  • Technical Sustainability involves coordination among technology developers and the overall program to develop the NSDL in a persistent, functional, and visionary manner.

In 2004 NSDL began to publish "sustainability vignettes" in the Whiteboard Report for specified projects. The seven vignettes issued to date represent a range of multi-faceted approaches to continuation. [[127]] The Math Digital Library, for example, is creating new value-added services in close consultation with members of the Mathematical Association of America (MAA). According to MathDL's vision, new components-for example, MAA Reviews, Classroom Capsules, online MathDL books and meeting and workshop software-would be free to members but non-members would be required to subscribe or pay a usage fee. Similarly, MathDL's Journal of Online Mathematics and its Applications (JOMA) may transition to a member-only benefit, requiring others to pay for access. In brief, MathDL's sustainability plan hinges on a combination of support from MAA and from direct income streams. Another NSDL project, Teacher's Domain, sponsored by WBGH, is seeking "collaborative partnerships and strategic alliances," along with the expectation that its courses will become self-funded through licenses to educational institutions and organizations.

The NSDL Sustainability Standing Committee is developing a decision-tree exercise, designed to help principal investigators determine if and how to sustain their NSDL projects. Alternative decision paths branch out from responses to questions about the project's sustainability objectives, its relevance, institutional support, and market opportunities-resulting in recommendations to discontinue the project as unsustainable or to consider open-source community, not-for-profit or for-profit corporation resolutions.

Several other initiatives, addressing user-community and technical sustainability, merit discussion. Effective October 1, 2003, the California Digital Library (CDL) received a two-year NSF grant to develop and enrich the NSDL by determining how to best integrate it into academic library services. In an effort to support the development of NSDL's long-term business plan, the grant provided for a market assessment to determine user needs and expectations of high-quality science online resources. Through focus groups, interviews, and a comparative review of user-specified high-quality science resources (e.g., HighWire, Scirus, PubMed, CiteSeer), CDL market research revealed:

  • Limited prior awareness of NSDL; lack of differentiation vs. other government science Web sites (e.g., Science.gov).
    • N.B. In May 2006, NSDL announced that Science.gov had added NSDL to its collection. According to the announcement in NSDL's Whiteboard Report: This means that users can search all the science databases and more than 1,800 science Web sites at Science.gov (http://www.science.gov/), plus the 1.1 million records of science, technology, engineering and mathematics education resources at NSDL, with just one click. [[128]] (See Figure 27 below.)
  • Strong resistance to institutional subscription model, especially in current California K-12 funding climate.
  • Most participants see more value in the NSDL collection as a classroom teaching aid for K-12.
  • Academic libraries see limited value in another Web science portal, but would be willing to consider paying for deep integration with their existing search tools.
  • Mixed levels of interest in personalization and publishing tools. (California Digital Library 2004, 1)
There is further evidence from this 2006 review of NSDL's functionality (and its new technical infrastructure) that the findings and recommendations of CDL's market assessment are informing NSDL's current development and fund allocation. For example, in April 2006 NSDL Core Integration was awarded a grant in collaboration with Utah State University, and SUNY-Cortland to help teachers learn to design educational activities with NSDL resources that will lead to more teacher-designed and contributed content in NSDL and will also measure the impact of project activities on teaching practice. [[129]]

CDL's recommendations are annotated below with checkmarks to indicate areas of subsequent progress ("o" indicates not implemented as of mid-May 2006):

  • NSDL should provide free, open access to its basic collection through a public Web portal that provides basic metasearch features and a browsable subject hierarchy.
    • Browsable subject hierarchy instated.
  • Improve current NSDL portal by improving visibility of search, creating browsable subject hierarchy in HTML, and a clear statement of purpose and intended audience.
    • NSDL has developed five entry points geared towards different audiences.
  • Encourage K-12 classroom use by providing access to lessons plans, subject guides, and interactive features; consider partnering with established K-12 content providers.
    • These features are available via the ASKNSDL service.
    • Content Assignment Tool aligns national and state educational standards to resources.
    • Several Pathways partners are addressing this, e.g., WBGH's Teacher's Domain, AAAS's Biological Sciences Pathway (via BEN portal), Engineering Pathway (merger of NEEDS and TeachEngineering) and the Middle School Pathway (Ohio State University).
    • Established partnership with the National Science Teachers Association (NSTA) to deliver 11 NSDL Online Science Web seminars through June 2007.
  • Explore development of value-added services for academic libraries, including:
    • o MARC record export
    • o OpenURL support
    • o Integration with other federated search platforms
    • o Mapping of controlled vocabularies (e.g. MeSH-type thesaurus)
In addition, CDL recommended that NSDL evaluate incorporation of various features suggested by focus group participants. Items with checkmarks have been implemented:
  • o Citation linking
  • o Abstracts
  • o "Smart parsing" of search terms (e.g., cell biology > "cell biology")
  • o Suggest related terms based on search input
  • o Search history
  • Ability to rank search criteria: NSDL removed rank from search results based on user testing. Results are sorted based on rank.
  • Image search tools (e.g., Browse, "NSDL At a Glance")
  • "Search within these results": beta version in place on the development server, not in production.
  • Personalized views of the collection
  • Community features (e.g., discussion forums, listservs, RSS for registered users) (Features list from CDL 2004, 20; annotated with check-marks by author)
A second focus of the grant is to develop a prototype service that integrates NSDL into "foundational science collections managed by libraries" and provide the tools to create different views "customized to the needs of different patrons." [[130]] As of this writing, the prototype NSDL service integration is not yet available, but CDL is creating a portal for the geosciences (FindIt: Earth Science) that offers users a unified interface to search domain-specific proprietary databases (e.g., GeoRef, Web of Science) alongside OAI-harvested NSDL and DLESE records and items retrieved via targeted Web crawling. [[131]]

Other major services, for example Science.gov, are starting to integrate NSDL resources into their search capability.

Source: http://www.science.gov/ (May 6, 2006)

Finally, the OCKHAM Initiative (described in section 3.2), led by Emory University and Oregon State University, aims to establish "an extensible framework for networked peer-to-peer interoperation among the NSDL and traditional libraries." To this end, it is developing a suite of tools (middleware) to help integrate NSDL collections and services into traditional library service environments while also creating a current awareness alerting service and a registry to facilitate machine-to-machine and end-user discovery of digital library services. This is vital to the future effective interoperation among existing NSDL collections. In addition, NSDL and DLF are working together to establish and promulgate Best Practices in Shareable Metadata as discussed earlier in this report.

Leveraging Individual Project Activities and External Relationships

This description of NSDL concentrates in large part on its Core Integration activities as an aggregator of STEM collections and services. NSDL, however, makes many other valuable contributions to advancing STEM teaching and learning by leveraging partnerships between individual projects and national partners. This is particularly evident in NSDL's involvement in the promoting educational achievement standards and professional development workshops. To cite one prominent example, the NSDL Achievement Standards Network (ASN), developed with NSDL funding by Jes & Co. (http://www.jesandco.org/), will provide hands-on learning standards systems for every state. The NSDL resource records are part of the State Educational Technology Directors Association (SETDA) 2006 Tool Kit (http://www.setda.org/content.cfm?SectionID=265), developed in conjunction with the U.S. Department of Education. The initiative includes tools, technologies and best practices that enable states to manage electronic versions of their academic standards, align resources and assessments consistently using open and interoperable methods, and embed standards seamlessly in all manner of learning and assessment systems and systems of accountability.

4.3.2 SMETE: Science, Mathematics, Engineering and Technology Education Digital Library

Update Table 15: SMETE based on DLF Survey responses, Fall 2005
SMETE Digital Library
NEEDS: National Engineering Delivery System http://www.needs.org/needs/
ORGANIZATIONAL MODELOpen federation, voluntary membership w/ partners and affiliates funded by NSF and other public/private agencies
SUBJECTScience: science, mathematics, engineering & technology
FUNCTIONCollection of collections and community of communities
SIZE9,500 resources in SMET disciplines including 2,200 engineering resources
USEPer month: 30,000 page hits
ACCOMPLISHMENTS1. Interoperability with other digital libraries.
2. Providing digital repository services, e.g., Digital Chemistry, Exploratorium, NCWIT (National Center for Women & Information Technology).
3. Community development with Premier Award and monthly theme pages.
CHALLENGES1. Expanding community building beyond ASEE (American Society for Engineering Education) audience.
2. Sustainability planning.
3. Quality control of metadata and contents of the learning objects in merger between NEEDS (National Engineering Education Delivery System) and TeachEngineering: Resources for K-12
TOOLS OR RESOURCES NEEDED1. Push technologies. NSDL On-Ramp.
2. Community building tools, e.g., Threaded discussion forums, Blogs, Newsletters.
GOALS OF NEXT GENERATION RESOURCENEEDS will be merging with TeachEngineering to form the new Engineering Pathway to serve the entire engineering education community from K-12 to lifelong learning. SMETE.org will continue to be NEEDS technology platform to provide supports for other online learning projects such as the Mobile Learning project sponsored by HP and CITRIS (Center for Information Technology in the Interest of Society).

The SMETE Open Federation continues as a membership organization launched with NSF NSDL Collection and Core Integration funding whose "primary mission is to establish universal access to academic excellence in SMET education." The Federation has more than forty partners including the American Association for the Advancement of Science (AAAS), the Coalition of Networked Information (CNI), and OCLC as well as other digital libraries dedicated to science education (including all of the services under review in this section) and a dozen universities and corporations. SMETE helps to develop leading-edge technologies to share among its members while also maintaining a collection of premier learning materials.

SMETE collaborated with the Exploratorium, in San Francisco, California, to create the Exploratorium Digital Library, a collection of high-quality teaching resources and activities (http://www.exploratorium.edu/educate/dl.html) that is also integrated into NSDL. SMETE has also provided technology services to other digital libraries including BioSciEdNet (BEN, http://www.biosciednet.org/portal/), MathDL (http://mathdl.maa.org/), and the Digital Chemistry (http://socrates.berkeley.edu/~kubinec/). SMETE resources are cataloged to meet the requirements of the IEEE Learning Object Metadata Standard and SMETE has developed tools to transform local application profiles (e.g., from LON-CAPA, http://www.lon-capa.org/ and the Michigan Teacher Network, http://mtn.merit.edu/) to normalized application profiles. SMETE collaborates with MERLOT on peer reviews. [[132]]

In addition to supporting search queries by keyword, author/creator, title, and publication date range, the user interface offers various options to limit searches by more than 20 different types of learning resource (e.g., case study, dataset, lesson plan); grade level (primary education to post-graduate and vocational training to professional development); and eight specific collections (e.g., ACM Women in Computing, Math Forum, Michigan Teachers Network, NEEDS). Searches can be restricted to peer-reviewed materials. Search results are returned with briefly annotated entries including a search score. Each result is clearly branded according to its platform (e.g., PC, MAC, Web); cost (e.g., free or $); availability of reviews; and native collection. Registered users can create a profile and save resources in a workspace. User information can be shared to identify other community members with similar interests.

The results' screen provides users with related terms to extend the search as well as the ability to conduct a federated keyword search in partner collections. The partner collections include NSDL, MERLOT, and NEEDS. A technical report available at SMETE explains its strategy for adopting a SOAP-based SMETE Search API to implement federated searches across heterogeneous collections. [[133]] NEEDS: National Engineering Education Delivery System

The American Society for Engineering Education in partnership with seven leading engineering schools (e.g., UC-Berkeley, Worcester Polytechnic Institute, Colorado School of Mines) is creating a unified K-gray engineering pathway, under the auspices of NSDL. NEEDS, a digital library for engineering education, will merge with TeachEngineering (Resources for K-12) to establish a single comprehensive portal for engineering. Both NEEDS and TeachEngineering (TE) are highly regarded by their respective communities. Through its annual "Premier Award" courseware competition, NEEDS is a national leader in stimulating and evaluating high-quality engineering courseware targeted for undergraduate teaching. It has translated the award selection criteria into best practices in courseware design, helping to promulgate high standards of excellence. Through the combined expertise of NEEDS and TE, they expect to:

  • Significantly and sustainably grow high-quality resources;
  • Align the unified curricular materials with appropriate undergraduate and K-12 educational standards;
  • Grow the participation of content providers and users;
  • Enhance quality control and review protocols for content; and
  • Expand gender equity and ethnic diversity components by cataloging and reviewing curricular resources created by female-centric and minority-serving organizations.
As an initial step in developing a unified service based on SMETE's technology platform, NEEDS and TeachEngineering (TE) launched a blog to discuss desirable features for the new pathway. An initial list of tools and services included:
  • Browse curriculum (TE)
  • Search resources/curriculum by Keyword, Grade Level, Educational Standard (TE), Required Time (TE), Cost (TE), Learning Resource Type (NEEDS), Title (NEEDS), Author (NEEDS), Review (NEEDS), Series (NEEDS), Host Collection (NEEDS), Publication Year (NEEDS)
  • Personal workspace (MyTE, NEEDS Workspace)
  • Reviews for resources/curriculum
  • OAI server that exports NSDL Dublin Core metadata for harvesting
  • Recommendation system (NEEDS)
  • Web service for search through SOAP (NEEDS)
  • Metathesaurus to suggest related search terms (NEEDS)
  • RSS feeds of new resources (NEEDS)
  • Online cataloging (NEEDS) [[134]]

4.3.3 BioSciEdNet (BEN) Collaborative

Update Table 16: BEN Collaborative based on DLF Survey responses, Fall 2005
BiosciEdNet (BEN ) Collaborative
ORGANIZATIONAL MODELCollaborative sponsored by the American Association for the Advancement of Science and other disciplinary organizations.
SUBJECTScience: biological sciences
FUNCTIONPortal to digital libraries for teaching and learning in the biological sciences.
SIZECollaborators increased from 15 to 22 (46.6% growth). Peer-reviewed resources grew from 1,000 to 4,100 (310% growth). Registered users grew to 5,500 and 92% are educators. BEN covers 76 (previously 51) topics in the biological sciences.
USEPer month: >1.4 million visitors to the BEN portal and collaborator sites. ~6,000 registered users: 91% teach (62% at undergraduate and 19% at high school level).
ACCOMPLISHMENTS1. Initial development of models for transforming smaller organizations into contributors of resources to digital libraries.
2. Increased the number of peer-reviewed individual biological sciences learning objects or resources.
3. Conducted a BEN User Survey in September 2004, where 515 responses were returned in a 3-week timeframe, representing a 14% return rate.
CHALLENGES1. Building and supporting a diverse contributor/user base for the digital libraries is one of the most critical issues that BEN faces. Since undergraduate biology is a core course in many colleges and universities and high school biology educators tend to teach 4 to 5 biology classes a day, these educators often have severe constraints on both time and resources.
2. Building digital collections that are inclusive of all educators and students. Biological sciences educators, particularly in high schools and community colleges and regional comprehensive institutions, have student bodies diverse in every respect - learning styles and ability, geography, economics, race, gender, physical disabilities, and experience.
3. Streamlining and lowering the barriers to participation by additional organizations that develop high-quality peer-reviewed bioscience educational materials, but don't have the technology or staff to develop digital library collections from the ground up.
TOOLS OR RESOURCES NEEDEDDevelopment of a BEN Faculty Campus Representative Program for Increasing Contributors and Users of BEN and the NSDL. Establishment of mentor relationships between mature and new BEN Collaborators. Provide software tools for BEN Collaborators.
GOALS OF NEXT GENERATION RESOURCE1. To increase the number of Collaborators that BEN aggregate resources from 13 to 22.
2. Through mentoring and technical assistance to other organizations, the total number of biological sciences digital libraries developed by members of the BEN Collaborative would increase from 6 to 13.
3. Develop a Faculty Campus Representative Program, including related professional development, materials and a demonstration CD ROM. Through the Faculty Campus Representative Program, 45 college and university faculty members, geographically dispersed around the US, will be prepared to provide campus and community-based workshops and technical assistance in selected areas for an estimated 2,700 prospective contributors to both BEN and the NSDL.

In fall 2005 the BEN Collaborative, led by the American Association for the Advancement of Science (AAAS) with a dozen founding-partner professional societies, received NSF NSDL funding to expand into a Biological Sciences Pathway for educators at the high school and undergraduate levels. [[135]] Over a four year period, the Pathway funding will enable BEN to increase the number of: collaborators from which it aggregates resources from 13 to 22; digital libraries it helps professional society members to develop from 6 to 13; and cataloged resources in the BEN metadata repository from 4,000 to 27,000 items. With more than 100 professional organizations in the life sciences, BEN's core content aims to jump-start teaching introductory biology courses by unifying resources that are otherwise highly fragmented and widely dispersed.

The Pathway builds on BEN's successful track record as a portal manager providing database development, resource cataloging, metadata validation software tools, and Web trend reporting for professional societies. BEN's Learning Object Management (LOM) cataloging system has seven components:

  • General
  • Lifecycle
  • Technical
  • Educational
  • Rights
  • Classifications (subject taxonomy and pedagogic use taxonomy)
  • Metadata

In addition to developing digital libraries with common technical standards that contribute resources to the BEN portal, BEN partners promote best practices for pedagogy, authentic assessment and the development of multidisciplinary biological sciences resources. A shared online workspace facilitates communication among collaborators. BEN relies on NSDL's technical architecture for integration of its resources into the NSDL Data Repository as well as access to NSDL's new applications (e.g., Expert Voices, Content Alignment Tool).

Table 21: BEN Partner Libraries
Existing Digital LibrariesNew Digital Libraries
AccessExcellence.org: National Health Museum http://www.accessexcellence.org/
APSArchives.org: American Physiological Society http://www.apsarchive.org/Main/index.asp
BioMoleculesAlive.org: American Society of Biochemistry and Molecular Biology http://www.biomoleculesalive.org/
EcoEd.net: Ecological Society of America http://www.ecoed.net/
MicrobeLibrary.org: American Society of Microbiology http://www.microbelibrary.org/
Science's STKE: American Association for the Advancement of Science http://stke.sciencemag.org/
AIBS: American Institute for Biological Sciences http://www.actionbioscience.org/
BCC: BioQuest Curriculum Consortium http://www.bioquest.org/
BSA: Botanical Society of America http://www.botany.org/
DNALC: Dolan DNA Learning Center http://www.dnalc.org/
EntDL: Entomology Digital Library http://cipm.ncsu.edu/PIinfo.cfm?PIID=10062003024214 (under development)
SDB: Society for Developmental Biology http://www.sdbonline.org/
VIDA: Video and Image Data Access (VIDA)/Cal State Fullerton http://scied.fullerton.edu/vida/vidapedagogy.html


To ensure quality control of learning object resources, BEN partner societies are expected to establish a peer review framework that specifies the review timeline, criteria, ranking, and types of reviewers involved in evaluating each type of resource. Examples of the peer-review processes created by its constituent professional societies are available from BEN's Web site. [[137]] While the number of BEN resources is relatively low at present, it is one of the few NSDL projects with a coherent cohort of peer-reviewed individually tagged lesson plans and classroom activities. As January 2006, BEN's inventory of 4,111 resources included:

  • AAAS (220 lesson plans and multimedia resources)
  • ABLE (66 Lab Exercises and Manuals; 2 Teaching Strategies )
  • AIBS (184 teaching and learning resources)
  • APS (501 teaching and learning resources)
  • APSNet (57 Plant Disease Lessons and articles)
  • ASBMB (39 articles and interactive resources)
  • ASM (1141 teaching and learning resources)
  • BSA (948 annotated images)
  • ESA (192 teaching and learning resources)
  • FUN (20 journal articles)
  • HAPS (266 journal and newsletter articles)
  • NHM-Access Excellence (206 teaching and learning resources)
  • STKE (317 reviews, perspectives, and multimedia resources)
  • SOT (9 teaching and learning resources) [[138]]
BEN's user interface supports basic keyword and advanced searches along with browsing by subject and resource type. The number of items represented in each of the 76 subject areas ranges from microbiology and botany with more than 1,000 resources to hematology and glycobiology with fewer than five. The 44 categories of resource types span from images (1,352 items) and journal articles (747 items) to maps, discussion groups and assessment-exam with answer key (1 item). Advanced search offers a variety of filters, described in the previous report. As the aggregation of cataloged resources grows, the utility of these filters will increase. BEN's User Survey, conducted in September 2004, found that users (550 responses) accessed all the BEN partner sites almost equally; 56 percent downloaded resources and 67 percent used BEN resources for lectures. [[139]]

In four years time, BEN expects to have established 45 college and university faculty representatives around the country who are trained to provide assistance to prospective BEN contributors and users. BEN operates under the aegis of a Coordinating Council that includes representatives from the AAAS and four professional societies as well as a national Advisory Board comprised of college and university educators.

4.3.4 DLESE: Digital Library for Earth System Education

Update Table 17: DLESE based on DLF Survey responses, Fall 2005
DLESE: Digital Library for Earth System Education
ORGANIZATIONAL MODELCommunity-based organization with NSF funding.
SUBJECTScience: Geosciences
FUNCTIONInformation system and services to facilitate learning about the Earth system at all educational levels.
SIZE12,000 learning resources in > 20 collections, continually growing. Includes community-contributed teaching tips, resource reviews, and news and opportunities announcements.
USEPer month: 50,00 user sessions
ACCOMPLISHMENTS1. Ongoing accessioning of multiple collections.
2. Services-oriented architecture (SOA) including Web search service and java script search that allows for customized search interfaces and greater dissemination of resources (Weatherley 2005).
3. Distributed, Web-based cataloging tool that supports multiple collections and multiple metadata frameworks.
4. OAI data provider and harvester tool.
CHALLENGES1. Strategic planning
2. Continuing to meet the emerging needs of the geosciences education community.
3. Connecting with other geoscience cyberinfrastructure initiatives that will help integrate research and education.

Funded by NSF's Directorate for Geosciences, the DLESE Program Center (DPC) operates under the aegis of the University Corporation for Atmospheric Research (UCAR) in Boulder, Colorado. DLESE plays a leadership role in bridging the education and research components of geoscience cyberinfrastructure (Marlino et al 2004).

The goals of the DLESE Program Center are to:

  • develop and provide library infrastructure tailored to specific geoscience education needs;
  • enable distributed collections and services to act as an integrated whole;
  • provide interoperability services with other library efforts (e.g. NSDL );
  • support community capacity building by providing tools, components, and services that enable the development of high-quality collections of teaching and learning resources;
  • conduct ongoing library operations; and
  • offer broad-based community support. [[140]]
DLESE serves both K-12 science instruction and undergraduate education. According to a short user survey conducted from October 2004 through February 2005, 34 percent are K-12 science teachers and 12 percent college/university faculty members; 13 percent are K-12 students and 10 percent are college students. Developers of educational materials accounn140"> [[140]] DLESE serves both K-12 science instruction and undergraduate education. According to a short user survey conducted from October 2004 through February 2005, 34 percent are K-12 science teachers and 12 percent college/university faculty members; 13 percent are K-12 students and 10 percent are college students. Developers of educational materials account for 7 percent; parents, librarians, and others (non-geoscience teachers, outreach coordinators, professional development experts, and DLESE staff) comprise the remaining 24 percent. [[141]]

What are they seeking?

30 percentMaterials for students
18 percentMaterials for an assignment
13 percentInformation about the library (i.e. DLESE)
7 percentInformation for curriculum development
6 percentInformation for their own learning
5 percentCollaborators for a project [[142]]

DLESE maintains two primary collections. Resources in the "DLESE Community Collection" (~7,100 items) meet basic guidelines in terms of subject relevance and functionality. [[143]] The more selective "DLESE Reviewed Collection" [[144]] is composed of resources (~670 items) that have been evaluated against seven criteria:

  • 1. scientific accuracy;
  • 2. pedagogical effectiveness;
  • 3. completeness of documentation;
  • 4. ease of use for teachers and learners;
  • 5. ability to inspire or motivate learners;
  • 6. importance or significance of the content, and
  • 7. robustness as a digital resource. (Kastens et al. 2005)
In addition, DLESE collects metadata from other digital libraries (e.g., Alexandria Digital Library) and thematic collection developers (e.g., the Digital Water Education Library-DWEL or the Earth Exploration Toolbook-EET). The EET is an innovative collaboration that utilizes earth science data within NSDL and DLESE to create an online collection of computer-based learning problem-solving activities. [[145]] Currently EET has fourteen chapters organized around learning activities, such as analyzing the Antarctic Ozone Hole, exploring regional differences in climate change, or visualizing carbon pathways. Each chapter is accompanied by relevant datasets (derived from NASA, USGS, and U.S. Census data or other sources) and technology tools (e.g., GIS, image processing programs, spreadsheet applications). To facilitate use, EET has created companion, professional development Data Analysis Workshops for teachers.

Users can browse DLESE's collections by subject, resource type, and grade level.

Source: http://www.dlese.org/dds/histogram.do?group=gradeRange&key=drc (May 2006)

As illustrated in the figure above, each collection has a bar graph, charting the number of resources as well as a collection annotation and link to the collection's scope and policy statement.

Text searches can be filtered by grade level, resource type, collection, and educational standard. At present, DLESE has the ability to search by National Science Education Standards (NSES) and by National Geography Standards (NGS). The National Geography Standards organize learning concepts under six broad topical categories: Environment and society, Human systems, Physical systems, Places and regions, the Uses of geography, and the World in spatial terms, for a total of 18 individual standards. The NSES are hierarchical and permit users to choose grade level, broad topic, and learning goal. For example:

  • Grades 9-12
    • o Earth and space science
      • § Energy in the earth system
      • § Geochemical cycles
DLESE is working jointly with Syracuse University's Center for Natural Language Processing (CNLP) to incorporate additional state and national standards into the library and connect with the Achievement Standards Network (ASN) database maintained by JES & Co. and funded by the NSDL. In addition, the CNLP released a prototype of its Content Assignment Tool (CAT) as an API integrated within DLESE's Collection System (DCS) in early 2006 (Diekema and Devaul 2006). CAT uses natural language processing to analyze the content of learning resources, such as lesson plans, and then automatically suggests relevant national and/or state standards. It is intended not only to aid catalogers in assigning appropriate standards and providing a cross-walk between different state and national standards, but also permits users to save their choices to a database. A beta version is currently available for testing by registered users. [[146]]

DLESE makes innovative use of its "Community Review System" to create customized reports for teachers that assess the effectiveness of digital learning resources in their classrooms (Kastens and Holzman 2006). The Introductory Geoscience Virtual Textbook was created as a test bed for the CRS individualized teacher report system, utilizing DLESE resources to teach students about basic concepts in Earth science. [[147]] Both students and the instructor write reviews of the digital resources based on the seven criteria noted above for "reviewed resources" and then the DLESE CRS creates a report aggregating and comparing the data from the instructor's and student's perspective. Examples of various types of reports generated by the CRS are available at DLESE's Web site. [[148]]

Source: http://www.dlese.org/ (May 2006)

Since the 2003 DLF report was issued, DLESE's information technology infrastructure has evolved into a service-oriented architecture (SOA), with improved interoperability capabilities that extend its reach through Web service and JavaScript APIs (Weatherley 2005) (see http://www.dlese.org/dds/services/). The Center for Ocean Science Education Excellence (COSEE, http://www.cosee.net/), for example, has embedded a custom DLESE search in their Web portal that is implemented using the DLESE Search Web Service and the My NASA Data portal utilizes a custom search page implemented with the JavaScript API (http://mynasadata.larc.nasa.gov/DLESE_search.html). In addition to these, DLESE services and APIs are being used to deliver DLESE resources interactively to users of GLOBE, NASA S'COOL, the GEON portal and several other institutional Web sites and portals.

The California Digital Library is harvesting DLESE's OAI records and integrating them into a geosciences portal tailored to the users of the UC campus libraries. DLESE is also a Principal Investigator (PI) Institution in GEON, a network building cyberinfrastructure capacity in geoinformatics for research (GEON) and educational (DLESE) purposes.

GEON is based on a service-oriented architecture (SOA) with support for "intelligent" search, semantic data integration, visualization of 4D scientific datasets, and access to high performance computing platforms for data analysis and model execution -- via the GEON Portal. http://www.geongrid.org/
GEON and DLESE interoperate in a number of important ways (Wright 2004). GEON uses the ADN Metadata Framework, (jointly developed by the Alexandria Digital Library, the NASA Science Mission Directorate and DLESE) [[149]] and the two services share collection records. GEON Web services and content are available in DLESE (http://geon01.dlese.org/) and the GEON Portal provides access to DLESE. GEON, DLESE, and the University of Colorado are collaborating to create an "Educational Knowledge Organization System" (EKOS) that supports conceptual browsing (concept strand maps) to align learning outcomes and educational standards with DLESE's resources (Wright 2004, Sumner et al. 2004). [[150]]

Source: http://preview.dlese.org/jsp/cms/ (May 2, 2006)

Custard and Sumner (2005) report on their research to "Using Machine Learning to Support Quality Judgments" about digital resources and collections. NSDL and DLESE were used as a test case for their research to determine if a set of "indicators could be used to accurately classify resources into different quality bands and to determine which indicators positively or negatively influenced resource classification." According to the authors, "The results suggest that resources can be automatically classified into quality bands, and that focusing on a subset of the identified indicators can increase classification accuracy." In the future, collection curators may rely on these "next generation cognitive tools" to support their qualitative decisions about which digital resources to acquire.

Publications and presentations by members of the DLESE community are listed in the bibliography maintained at the DLESE Web site. [[151]]

4.3.5 MERLOT: Multimedia Educational Resource for Learning and Online Teaching

Update Table 18: MERLOT based on DLF Survey responses, Fall 2005
MERLOT: Multimedia Educational Resource for Learning and Online Teaching
ORGANIZATIONAL MODELCommunity-based with free open individual or partner membership (with annual institutional fee-based benefits).
FUNCTIONImprove the effectiveness of teaching and learning by increasing the quantity and quality of peer reviewed online learning materials that can be easily incorporated into faculty-designed courses.
PRIMARY AUDIENCEAcademic community
SIZE2 Sustaining institutions; 23 system and campus partners and affiliates; 13 professional societies and 9 digital libraries; 8 corporate sponsors and 30,000 individual members.
13,000 learning materials (37% growth) organized in 15 disciplines categories.
USEDaily use with 1,000 new members monthly. [[152]]
ACCOMPLISHMENTS1. Established reputation for high quality and sustainability.
2. Development of Corporate Partnerships.
3. Development of JOLT (Journal of Online Learning and Teaching).
4. Provision of discipline-communities.
CHALLENGES1. High demand but limited resources.
GOALS OF NEXT GENERATION RESOURCE1. Increase membership and collection growth.
2. Expansion of faculty development services.
3. Extending the disciplinary model to additional areas of academic and workforce interest.

Those new to MERLOT have several options to familiarize themselves with its services and features. From MERLOT's Web site, users can access a brief video introduction (replete with faculty testimonials), listen to a presentation about MERLOT co-sponsored by the TLT Group, watch an interview with MERLOT's Executive Director, Gerry Hanley, or listen to his longer video presentation, "Sharing Learning Objects: Serving MERLOT to Higher Education." [[153]] In summarizing what makes MERLOT work effectively, Hanley emphasizes these characteristics:

  • We create a common means to individual ends.
  • You get more than you give.
  • You have a fair share in decision-making and participation.
  • We hold true to academic values.
  • We provide visibility, accountability and sustainability.
  • You trust us to deliver high quality services.
MERLOT has an organizational partnership structure that defines levels of participation and obligations, including annual membership fees and in-kind support. [[154]] There are three broad organizational categories:
  • higher education institutions
  • non-profit institutions (professional societies and digital libraries)
  • corporations
And four levels of participation:
  • Affiliate: joint advocacy but low-level of cooperation, requires an application or MOU
  • Project-level: collaborate in MERLOT initiatives and pay $6,500 annual fee (for campuses or negotiated rate for other organizational types) with in-kind support required for projects.
  • Community: participate in MERLOT leadership and collaborate on projects; pay a $25,000 with $50,000 to $100,000 in-kind support required for leadership and initiatives.
  • Sustaining: lead a MERLOT initiative, participate in MERLOT management and pay a $50,000 annual fee plus $250,000 in-kind support required.
The Partnership Comparison Chart provides details of membership benefits for institutions of higher education in the areas of training, involvement in MERLOT leadership, collaboration and evaluation opportunities, and access to MERLOT member-only resources. [[155]]

Since 2003, MERLOT has expanded its international outreach and content through strategic alliances in Canada, Europe and Australia. CLOE, the Co-operative Learning Object Exchange led by the University of Waterloo (Ontario, Canada) is now a major sustaining partner alongside the California State University. The ARIADNE Foundation for the European Knowledge Pool, a distributed network of learning repositories, has become a MERLOT partner. In addition, MERLOT, ADRIADNE and EdNA, the Education Network Australia of learning repositories, each offer federated searches across their collections individually or collectively. [[156]] They are also all members of the consortium, GLOBE (Global Learning Objects Brokered Exchange), along with eduSourceCanada and the National Institute of Multimedia Education (NIME) in Japan. [[157]]

MERLOT has also strengthened its corporate partnerships, which include O'Reilly Media and Sun Microsystems, three learning management systems (ANGEL Learning, Blackboard/WebCT, Desire2Learn) and two library systems (Ex Libris Ltd. and Sentient Learning). [[158]] These partnerships result in mutually beneficial services such as the seamless integration of MERLOT resources via Blackboard and ANGEL. [[159]] A similar service with WebCT will be available in July, 2006. As a matter of principle, MERLOT only signs non-exclusive agreements with vendors. It has assigned different values to its functions as follows:

  • Basic Search - gratis
  • Basic RSS feeds - gratis
  • Advanced Search - nominal fee
  • Customized RSS feeds - negotiated fees
  • Federated Search - negotiated fees
  • Other Services - negotiated fees
Participating vendors are required to adhere to the standard MERLOT Metadata Services Agreement in which MERLOT maintains control over the use of its technology, preventing institutions from harvesting its metadata. MERLOT's metadata is IEEE LOM or IMS metadata compliant, but it is not OAI-compliant nor is it available for export. As a result, MERLOT is only represented in NSDL at the collection-level. MERLOT does offer a search service, which is a Web service that allows remote searching of the MERLOT metadata and returns results in an XML format for display by the requester (as in the case of MERLOT's initiative with Blackboard).

In July 2005, MERLOT inaugurated JOLT: Journal of Online Learning and Teaching as a peer-review, open access vehicle to promote the scholarship of technology-enabled teaching and learning in higher education. JOLT serves as another forum in which the MERLOT community can express and examine issues of common concern.

MERLOT offers various avenues for users to keep abreast of recent developments besides its "What's new" page, quarterly email newsletter the Grapevine, and press releases. It supports syndication (RSS), and from the home page, users can quickly link to the most recently added resources (225 items), new member profiles (845), and peer-reviewed resources (26) contributed in the last thirty days.

MERLOT's basic user interface is the same as reported in 2003, but there a number of new or previously unrecorded features. The Advanced search functions permit users to limit their queries by a number of unique qualifiers going well beyond subject, material type, technical format, language, audience and cost. These include: learning management system compatibility (Blackboard/WebCT, Desire2Learn), iPod items, Section 508 compliant items (conform to minimal disability access standards), copyright restrictions, and availability of source code. In addition searches can be restricted to peer-reviewed resources (further refined by minimum rankings), member comments (further refined by user rankings), availability of assignments, and author snapshots. Author snapshots utilize the KEEP Toolkit developed by the Carnegie Foundation for the Advancement of Teaching to produce an illustrated synopsis (e-portfolio) of the educator's rationale, motivation, and impact on teaching and learning in developing the resource.

It is worth noting that some of these filters restrict the results to a very limited sub-set. For example, whereas 85 to 90 percent of the resources have been reviewed by faculty, only about 15 percent actually have published "peer reviews" in MERLOT (<2,000 items). [[160]] According to MERLOT representatives the comparatively low proportion of peer-reviewed resources is attributable to a combination of factors including the amount of time required by faculty, the author's consent, and the quality of the material. Consequently, resources deemed of lesser interest do not receive MERLOT Peer Review. Results can be sorted by five different variables (title, author, date entered, rating, item type). It is possible to conduct sub-searches within the result set.

Source: http://fedsearch.merlot.org/main/search.jsp (March 2006)

In addition to federated searches across ARIADNE and EdNA's learning object repositories, MERLOT offers two subject-based federated searches: physics (covering MERLOT Physics and ComPADRE-Digital Resource Collections for Physics and Astronomy Education developed as a NSDL Pathway) and teaching and technology (covering MERLOT resources and the University of Carolina's Professional Development Portal). Currently in test is a federated search from the MERLOT Information Technology portal into IEEE Computer Society's extensive digital library (http://www.computer.org/). A new version of the MERLOT Web site is currently under development and planned for release at the MERLOT International Conference in August, 2006.

4.3.6 Current Issues and Future Directions

Each in their own way, these services face organizational challenges to increase content and usage. NSDL is developing "pathways"-exemplified by NEEDS and BEN-to focus resources for particular audiences and coalesce services across sectors. NEEDS is merging with TeachEngineering to serve the full spectrum of K-12 to lifelong learners. BEN has excelled at developing models for transforming smaller organizations to become contributors to digital libraries. DLESE is developing tools to support distributed cataloging of multiple collections and different metadata frameworks. MERLOT is bringing in new international and corporate partners. This cohort has developed a number of effective marketing and outreach vehicles to secure and extend their user base:

  • NSDL now offers more interactive communications features from its Web site and is organizing more teacher workshops;
  • NEEDS offers digital repository services to other organizations; oversees the Premier Award to recognize outstanding courseware, and displays monthly theme pages;
  • BEN is developing a faculty campus representative program;
  • DLESE's supports a distributed, Web-based cataloging tool and is working with NSDL and Syracuse University to incorporate state and national standards into its database; and
  • MERLOT inaugurated JOLT: Journal of Online Learning and Teaching as a peer-review, open access vehicle to promote the scholarship of technology-enabled teaching and learning in higher education.

An essential ingredient to their success is offering quality assurance of content, one aspect of which is peer review. The user interfaces of SMETE/NEEDS, DLESE and MERLOT support filters to peer-reviewed items. However the actual proportion of such items is relatively low. A search limited to peer-review resources returns only 25 results in SMETE or NEEDS. Less than 15 percent of MERLOT's are peer-reviewed, whereas only 700 of DLESE's 12,000 resources are part of its reviewed collection. Although BEN has made considerable gains (310 percent increase since 2003), peer-reviewed resources constitute less than 15 percent of its database as well. This suggests that peer-review in the digital realm is still at an early stage of acceptance and is not well-integrated into faculty traditions and reward systems. According to an NSDL study underway by Alan Wolf, "The science faculty that he studies claim to trust neither peer review nor community vetting; instead, they simply rely on their own personal judgment in every case of using an OER [online educational resource], or they consult with a trusted colleague" (Harley et al. 2006, 166) [[161]].

These services also face the challenge of meeting the diverse needs of an expanded user base, particularly those that attempt to span the K to grey clientele. Research studies sponsored by NSDL among others reveal considerable differences by education sector in terms of what teachers need to integrate digital resources into their pedagogy California Digital Library 2004, Hanson and Carlson 2005, Harley et al. 2006). NSDL's pathways are intended to target resources and services to particular audiences, but it remains to be seen if these services can effectively serve diverse and sizeable constituents which have widely varying needs and operate in different conditions. NSDL, in particular, notes the "great diversity in evaluation methods and tools across 190+ NSDL digital library projects." This is corroborated by the CSHE study which reports that six NSDL collections included in their review "used almost completely different metrics to describe themselves and their use" (Harley et al. 2006, 157).

While these services are making strides to integrate their resources into other services (e.g., NSDL's incorporation into academic library portals and science.gov; MERLOT's federated search system and partnerships with WebCT/Blackboard; DLESE's partnership with GEONgrid), it remains to be seen how they will join up with other national and international communities of practice formed around e-learning technology platforms and e-learning frameworks. How do their efforts mesh, for example, with international efforts to make content object repositories interoperable such as CORDRA (Content Object Repository Discovery and Registration/Resolution Architecture, http://cordra.net/) or the IMS Global Learning Consortium (http://www.imsglobal.org/) (Kraan and Mason 2005)?

Finally, financial sustainability is a major challenge, cited particularly by NSDL and MERLOT, but also evident in responses from the other services. Through the efforts of its Sustainability Standing Committee, NSDL is tackling this issue by formulating a decision-tree and providing its constituent projects with information about establishing marketing and business plans; however, NSDL as a whole-like other services in this report-attest to the need for more public and private funding options. The California Digital Library's market assessment of NSDL suggests that "academic libraries see limited value in another Web science portal, but would be willing to consider paying for deep integration with their existing search tools" (California Digital Library 2004, 3). Even MERLOT, which has a fee-based membership structure, identifies the challenge of "high demand, but limited resources." Nor can MERLOT count on maintaining its current membership base. The CSHE study of "Use and Users of Digital Resources" notes that while MERLOT (alongside a handful of other services) "could function on an existing base of support, budgetary volatility encouraged them to continuously watch for new funding opportunities" (Harley et al. 2006, 147).

4.4 Joining Forces: Cultural Heritage and Humanities Scholarship

At present, we have the opportunity to reintegrate the cultural record, connecting its disparate parts and making the resulting whole available to one and all, over the network. . . . Like most grand challenges, this one can be simply stated: make it possible for people to explore the totality of our accumulated global cultural heritage, now scattered throughout libraries, archives, or museums. ACLS, Cyberinfrastructure in the Humanities & Social Sciences, 2005

The eleven services under review in this section serve as exemplars of ways in which librarians, archivists, educators, and scholars are collaborating to build digital collections and tools in support of cultural heritage and humanities scholarship. The discussion begins with two services that bridge the cultural divide by presenting collections and content from libraries, museums, and archives in a unified way. Cornucopia, sponsored by the Museums, Archives & Libraries Council (UK), serves as a single point of access for resource discovery, based on 6,000 collection-level descriptions from 2,000 institutions in the UK. Since the 2003 DLF report appeared, Cornucopia began to make its collection metadata available via OAI and SOAP. Further, it served as a model for a new project in the US led by the University of Illinois, namely the IMLS Collections and Content gateway to digital projects funded by the IMLS National Leadership Grant Program. The Institute of Museum and Library Services (IMLS) is an independent grant-making agency of the federal government whose mission is "to lead the effort to create and sustain a 'nation of learners'" (http://www.imls.gov/).

Both projects use the RSLP (Research Support Libraries Programme) Collection Level Description (CLD) Metadata Schema which enables consistently formatted descriptions to be created and linked through parent-child relations and association relationships (as depicted in Figure 32), building on entity relation models for collection descriptions (Healey 2000, 2005) [[162]]. In addition, these projects are informed by NISO's (National Information Standards Organization) "A Framework of Guidance for Building Good Digital Collections" (2nd edition, 2004) and the NISO Metadata Initiative, described in the next section of this report. [[163]]

The DLF's Digital Collections Registry, which is maintained also by the University of Illinois, is briefly described before turning to three services included in the 2003 DLF survey: the Library of Congress's American Memory, the Sheet Music Consortium, and the Collaborative Digitization Program's (formerly Colorado Digitization Program) Heritage West (formerly Heritage Colorado). These represent various models of fostering cooperative digital collections and aggregating at the international, national, and regional level.

Two pilot projects-The American West and DLF Aquifer-sponsored by the California Digital Library and the Digital Library Federation respectively, are starting to put into practice many of the lessons learned from previous collaborative projects. They are pooling digital content and building tools and services targeted to particular audiences. Meanwhile, Emory University's capstone initiative, SouthComb, leverages its prior digital initiatives including AmericanSouth covered in the 2003 DLF survey, to create a scholarly portal for Southern Studies.

Two scholar-driven projects round out this section. Since 2003, the Perseus Digital Library (PDL) has rebuilt its text system, released a new Web site, and launched a named entity browser. It plans to migrate its core data to the Tufts Institutional Repository in order to concentrate on research and development activities. Once PDL research applications prove viable, they will move to the IR's production server. Finally, NINES (Networked Interface for Nineteenth-Century Electronic Scholarship) represents a new scholar-driven model of aggregating peer-reviewed work and presenting it for use along with a suite of interpretative digital tools. Led by Jerome McGann, the John Stewart Bryan University Professor at the University of Virginia and editor of the acclaimed Rossetti Archive, NINES has garnered endorsements from five disciplinary societies and a host of other influential humanities computing organizations and projects.

4.4.1 Cornucopia

Update Table 19: Cornucopia based on DLF Survey responses, Fall 2005
ORGANIZATIONAL MODELMuseums, Archives & Libraries (MLA) Council (UK)
SUBJECTCultural heritage
FUNCTIONA single point of access for resource discovery based on collection level descriptions.
SIZE> 6,000 collection descriptions from 2,000 institutions.
USENot available
ACCOMPLISHMENTS1. Growth of contributions and descriptions.
2. Facility of locations to create/maintain their own descriptions.
3. Availability of data via OAI and SOAP.
2. Reconciliation of RSLP CLD Schema with different sector schemas.
3. Standardization of terminology
GOALS OF NEXT GENERATION RESOURCE1. Integration with Archive collections.
2. Extension of SOAP target

Developed by the Museums, Libraries and Archives Council (MLA), Cornucopia is a searchable database of some 6,000 collection descriptions emanating from 2,000 cultural heritage institutions in the UK. In spring 2004 Cornucopia migrated to a new software system and realigned almost all of its data structure to conform to the RSLP (Research Support Libraries Programme) Collection Level Description Metadata Schema (Turner 2005). The new system enhanced Cornucopia's functionality. Contributors can now edit and enter their collection data through a Web-based direct entry client; moreover, Cornucopia's data became available for OAI harvesting and Web service access. This enables interoperability among cultural heritage sites in the UK. For example, the People's Network Discover Service (http://www.peoplesnetwork.gov.uk/discover/) is harvesting Cornucopia data and making it searchable as one component of an aggregation harvested from an increasing number of cultural heritage sources. MLA's longer term vision is to provide integrated access to a wide range of data from the cultural sector, in which Cornucopia figures prominently.

As Cornucopia expands to incorporate more heterogeneous resources from an expanded institutional (e.g., including many more library collections) and user base, UKOLN undertook a strategic review of indexing options. A series of reports issued in September 2005 and January 2006 present comparative analyses of alternative thesauri, name authority files, and controlled vocabularies; recommend preferred indexing conventions for Cornucopia; and outline action plans for implementation. Among the key recommendations are to use the UK Archival Thesaurus (UKAT) for subject indexing and to abandon Cornucopia's current place indexing and use certain sections of the UNESCO Thesaurus instead. The time browsing page will be overhauled and new audience values and collection strength information added. New "Contributor Guidelines" give examples of how to assign appropriate index terms for subjects, places, time periods, names, audience levels, and collection strength based on UKOLN's findings. [163b]

Cornucopia's search and retrieval features have improved since 2003; however, in view of the new indexing recommendations, the description of its current functionality is provisional. Collections can be browsed by seven categories: time, people, place, subject, culture (e.g., Ancient Greece, Jewish, Maya, Viking), and institution. The user interface supports hierarchical, faceted browsing by subject. There are 21 broad subject categories (e.g., Education, Events, Information and Communication). In advanced search mode, users can narrow a collection title search by time period, place, type of institution (library, archive, or museum), or county. Alternative keywords are suggested to expand the search, based on the UK Archival Thesaurus. Results are returned with brief annotations, and link to full records that include (a) a collection summary, (b) location details (directory information about the institution), and (c) additional collection information; in some instances, there are links to the item via the institution's catalog. The "collect me" feature allows users to gather and save search results during a session for printing or emailing.

In addition, users can perform a search by postal code to locate collections in a particular location or conduct a search within or across three other Web services, including Cecilia: Find Music Collections in the UK and Ireland; Darwin Country; and Google. At present, no explanations are given to users about the coverage of the other services. However, Cecilia is a database of some 1,800 collection descriptions of music resources held in 600 libraries, archives and museums in the UK and Ireland (http://www.cecilia-uk.org/). Darwin Country, a partnership of several regional museums, focuses on the history of science, technology and culture in the West Midlands during the 18th and 19th centuries; it is affiliated with the UK's "Curriculum Online" initiative. [[164]] Among other features, Darwin Country enables the exploration of artifacts consisting of nearly 12,500 historic images (http://www.darwincountry.org/).

Besides Cecilia, various other digital projects in the UK have chosen to use Cornucopia's software and will provide their own user interfaces. They include:

4.4.2 IMLS Digital Collections & Content (DCC)

Update Table 20: IMLS Digital Collections & Content based on DLF Survey responses, Fall 2005
IMLS Digital Collections & Content
ORGANIZATIONAL MODELCollaboration among UIUC Library, UIUC Graduate School of Library & Information Science, IMLS
FUNCTIONRegistry and repository with search and discovery tools for integrated access to content of IMLS National Leadership Grant (NLG) collections.
PRIMARY AUDIENCEAcademic Community
SIZERegistry: 151 NLG collections plus 100 brief descriptions of related collections.
Metadata repository: 266,000 records from 85 IMLS NLG collections.
USENot available
ACCOMPLISHMENTS1. Creation of IMLS NLG Collection Registry with rich data input system & browse interface.
2. Helping NLG projects develop OAI data providers and promulgating OAI Best Practices.
3. Development of IMLS metadata repository
CHALLENGES1. Maintenance
2. Keeping up with changing standards
GOALS OF NEXT GENERATION RESOURCE1. Continue to add all new IMLS NLG grants to the Registry.
2. Continue to assist IMLS NLG grantees in setting up OAI data providers.

A collaborative initiative led by the University of Illinois, Urbana-Champaign (UIUC) Library and Graduate School of Library & Information Science, this gateway is intended to bring greater visibility and utility to digital collections funded by the IMLS (Institute of Museum and Library Services). The DCC serves as both a registry of collection-level descriptions of National Leadership Grant (NLG) projects and a metadata repository of item-level records from a subset of these collections. In its next phase of development (funded through 2007), the DCC expects to add a sample of digital collections funded via IMLS to State Library Administrative Agencies in support of the Library Services Technology Act (LSTA).

Integral to the development of the DCC, the principal investigators are testing the assumptions of the NISO/IMLS Framework of Guidance for Building Good Digital Collections (2004), namely how the registry and repository might serve as "infrastructure components" with "the potential to facilitate the reuse of digital content in new and different ways - by enabling more effective search and discovery across multiple collections and among and between individual information objects that will allow communities of scholarly interest to view an information landscape as best meets their needs" (Cole and Shreeves 2004, 309). Specifically, the DCC experiments with OAI-PMH interoperability best practices in terms of collection identity, metadata normalization and enrichment for specific audiences, and portal interface and functional design issues (Cole 2006).

In creating the collection registry model, the DCC draws on research about how to define and describe collections, ultimately opting to adapt the RSLP Collection Description Schema and the Dublin Core Collection Description Application Profile [[165]] (Cole and Shreeves 2004, 312). Taking into consideration similar projects, in which Cornucopia figured prominently, the DCC arrived at a collection description metadata schema with four classes of entities:

  • Collections
  • NLG projects
  • Institutions
  • Administrators (Ibid, 317-18)
Like Cornucopia, DCC enables contributors to add and edit their collection information. The DCC Web site offers full details about the metadata schema as well as a diagram illustrating the relationship among entities. The principal investigators elaborate:
A collection may have been created by multiple NLG projects and have multiple administrators. A collection may only have one hosting institution, but may have multiple contributing institutions. A collection may have multiple sub-collections, complementary collections, or source physical collections. A NLG project may have only one administering institution, but may have multiple participating (or collaborating) institutions.

Source: http://imlsdcc.grainger.uiuc.edu/collections/about.htm

A second major component of the project involves enabling cross-collection searching of item-level metadata using OAI-PMH to facilitate interoperability. To this end, the DCC deployed several strategies to help participating collections become OAI data providers including implementing an OAI Static Repository for some projects, and working with CONTENTdm, (already in use by other projects), to support "resumption tokens" that help to control the flow of records in manageable chunks to the DCC. As result, more collections are able to contribute item-level records to the DCC. Nevertheless as of this writing, only about half of the collections have associated item-level records. Noting that the absence of item-level metadata is particularly prevalent for exhibit and learning object focused projects, Cole and Shreeves offer other reasons why NLG projects are not yet OAI-compliant:

  • The digital collection is not yet public.
  • The technical infrastructure is not in place.
  • The technical infrastructure is in migration, for example, migrating to a new content management system.
  • All collaborators in a particular project have not reached agreement to share metadata via OAI.
Finally, the DCC principal investigators continue to struggle with issues related to metadata quality and harmonization of different controlled vocabularies in use by the majority of collections contributing item-level metadata. [[166]]

Over the next two years, the principal investigators expect to integrate collection-level and item-level services as well as customize the interface and metadata design for targeted audiences. At present, the DCC offers two distinct services with separate interfaces: the IMLS DCC Collection Registry and the IMLS Digital Content Gateway. As of January 2006, the registry represents 158 IMLS NLG projects as manifest in 108 primary NLG collection records with 40 additional sub-collection records and 29 associated collections (Cole 2006). Collections are classified according to the Gateway to Educational Materials (GEM) subject schema. As evident from Table 22 below, most collections are assigned more than one subject and contain multiple types of objects. At one extreme, Infomine is assigned to all subjects except Educational Technology and Physical Education. It is also the sole resource classified as Philosophy and all of its sub-categories-Aesthetics, Epistemology, Existentialism, Marxism, and Phenomenology as well as all seven sub-categories of Mathematics.

Table 22: IMLS Digital Collections Registry Subject Areas and Object Types (April 2006) N=123 NLG collections plus 40 sub-collections
Educational Technology9Image129
Foreign Languages5Interactive Resource17
Health7Moving Image11
Language Arts14Physical Object46
Physical Education2Unknown2
Social Studies131
Vocational Education9

The GEM's classification scheme seems out-of-balance with the subject coverage of the collections in the current deployment of the registry. Around 80 per cent of the collections are classified under "Social Studies" and within this, 104 (or 65 per cent of all collections) are "United States History."

In addition to browsing by subject, users can browse by Object, Place, Title, National Leadership Grant project, and Host Institution. The majority of the collections contain multiple types of images (233) and texts (202). Within these categories, it comes as no surprise that photographs, slides and negatives (105 collections) and books and pamphlets (66 collections) dominate. The registry supports basic and advanced searches. In advanced search mode, users can limit their queries to eight different object types (as noted above). Each entry is linked to the collection's home page, an extensive record about the collection, information about related collections and an annotation about the corresponding NLG project.

Users can link to the Collections Gateway via the "Home" button at the bottom of the screen. (There is no straightforward means to toggle back and forth between the registry and the gateway.) The gateway site supports fielded searches-by Title/Subject/Description, Author/Artist/Creator, Type, Date, and Publisher-deploying basic Boolean operators (AND, OR, NOT) and queries can be limited to all or selected collections (available via a drop-down menu). At present there are 32 collections with item-level records including two that have multiple sub-sets-Heritage Colorado (now Heritage West as discussed in this report) with 22 sub-sets and Museums & the Online Archive of California (MOAC) with 31 sub-sets-for a total of 85 collections altogether.

Users may choose to have results returned in order of relevance. Users can also specify their preference to display all records in short form on one page (up to a maximum of 500); otherwise twenty results are displayed per page. Each entry is linked to its host identifier; users can view the complete metadata record or add results to a Book Bag and save them (in XML) to a disk. In the left-hand frame of the screen, the search results are summarized according to the collection (and sub-set) to which they are attached. Users can modify or review their search history.

As evident from the table below comparing browsing and searching within and across the registry and gateway, the same query is likely to retrieve different results. In the three examples below, only <human sexuality> retrieves the same collections when using the browse and search features of the registry. However, it returns no hits in the content gateway (HEARTH does not make item-level metadata available). Meanwhile in the case of <dance>, only two collections-the ubiquitous Infomine along with Folkstreams.net-are classified with this GEM subject in the registry, but a search of collections in the registry retrieves a third collection pertaining to the subject, Masterworks Online. Despite its smaller universe of collections, searching the gateway site identifies 85 collections (comprised of 32 collections and 53 sub-sets) with more than 1,000 records relevant to "dance."

Table 23: Comparison of Browse and Search Results in the IMLS DCC Collections Registry and Content Gateway (April 2006)
Digital Collections Registry
Browse by Subject
HEARTH (Home Economics Archive: Research, Tradition, and History)105 collections
Digital Collections Registry
Search by Subject
Masterworks Online
HEARTH (Home Economics Archive: Research, Tradition, and History)107 collections
Digital Content Gateway
Search by Keyword
44 collections and sub-collections
More than 1,000 item-level records
None of the 3 collections above included in results
<sexuality> retrieves 10 records from 3 collections and sub-collections but not HEARTH
33 collections and sub-collections
More than 1,000 item-level records

These results illustrate the difficulties that lie ahead as the developers strive to integrate the registry and gateway services into a coherent framework.

4.4.3 DLF Digital Collections Registry

This new registry, maintained by University of Illinois, describes the digital collections hosted or contributed by DLF member institutions and allies that are publicly available and OAI-compliant. As of May 2006, it comprises more than 750 collections from 32 institutions in 19 states plus the British Library (UK). Most of the repository descriptions are based on the collection description schemas that were developed for the IMLS DCC project. In an early stage of development, the site still needs to publicize a collection policy and review its current listings against those criteria. It is accessible from http://gita.grainger.uiuc.edu/dlfcollectionsregistry/browse/.

The DLF Registry has the same user interface as the IMLS DCC Web site, with similar browsing and search options. Browsing collections by time and place reveals that most collections treat late 19th-century and early 20th-century resources about North America. However, the registry embraces collections from ancient to modern times and spans from Africa to South Asia. It is also possible to browse by institution and project. So far only one project is listed-American Culture embracing 46 collections.

Table 24: Number of DLF Digital Collections by Hosting or Contributing Institution (June 2006)
British Library17University of Minnesota5
California Digital Library11Dartmouth College2
Stanford University15NEW JERSEY
University of California-Berkeley15Princeton University12
University of Southern California26NEW YORK
CONNECTICUTColumbia University18
Yale University31Cornell University65
DISTRICT OF COLUMBIANew York Public Library17
Library of Congress112New York University7
GEORGIANorth Carolina State University9
Emory University28PENNSYLVANIA
Oxford College1Carnegie Mellon University20
ILLINOISPennsylvania State University15
University of Chicago27University of Pennsylvania23
U of Illinois at Urbana-Champaign16TENNESSEE
INDIANAUniversity of Tennessee11
Indiana University17TEXAS
MARYLANDUniversity of Texas at Austin13
Johns Hopkins University10VIRGINIA
MASSACHUSETTSUniversity of Virginia85
Harvard University24WASHINGTON
MIT30University of Washington36
University of Michigan24TOTAL COLLECTIONS754

This registry promises to make more visible the digital collections of prominent institutions. Eventually, it should mesh with the DLF Portal (described in section 4.1.8) to offer seamless collection to item discovery and access.

4.4.4 American Memory and Other OAI Digital Collections at the Library of Congress

Update Table 21: American Memory and other OAI Digital Collections at the Library of Congress based on DLF Survey responses, Fall 2005
American Memory and Other OAI Digital Collections
ORGANIZATIONAL MODELPilot phase w/ public/private partnership ended; now mainstreamed into LC operations.
SUBJECTCultural heritage
FUNCTIONPresents digital content from American Memory, LC Presents: Music, Theater & Dance, Veterans History Project, and Prints & Photographs Online Catalog.
PRIMARY AUDIENCEInterested public and educators.
SIZE130 collections (30% growth), over 10 million digital items (43% growth), 215,250 OAI-harvestable records (58% growth).
USEPer day: 200,000 page views and 15,000 searches
ACCOMPLISHMENTS1. Added 3 million items and about 30 collections.
2. Veterans History Project (VHP) and I Hear America Singing projects demonstrate XML-based approaches.
3. Global Gateway collaborative partnerships with 6 national libraries and other organizations.
CHALLENGES1. Dealing with multilingual materials in search and display.
2. Creating search across more than one digital conversion project (e.g., American Memory and Global Gateway)
3. Preparation for proposed World Digital Library.
TOOLS OR RESOURCES NEEDEDTools to support multilingual search.
GOALS OF NEXT GENERATION RESOURCEHave not yet finalized such goals.

Although the pilot public/private partnership aggregating collections into American Memory has ended, the Library of Congress remains at the forefront in facilitating standards-based digital aggregations and interoperability. In November 2005, Librarian of Congress, James Billington announced LC's campaign to create the "World Digital Library" (WDL) with an initial $3 million contribution from Google (Vise 2005). LC's impressive "Global Gateway" to multilingual resources on world cultures already establishes the precedent of building collaborative digital collections in partnership with other national libraries.

Since the 2003 DLF report appeared, American Memory has grown considerably in size, adding new digitized collections from LC, implementing XML-approaches to audio projects (e.g., Veterans History Project and the Library of Congress Presents: Music, Theater & Dance), and contributing records to the DLF MODS Portal. Its redesigned front page is a model of clarity and functionality, enabling users to:

  • Select a topic to browse collections or link to all browsing options
  • Link to the list of all collections
  • Look at the day's highlighted collections
  • Link for teachers to use American Memory in the classroom (via The Learning Page)
  • Submit a reference question to a librarian
  • Search all collections
  • Link to Help pages
  • Read about the history and mission of American Memory
  • Contact LC (four different options contingent on the nature of the query)

Source: http://memory.loc.gov/ammem/ (May 5, 2006)

In this way, it immediately addresses the varying needs of diverse users ranging from the novice to expert. At the secondary level,

  • Users can browse collections by all topics, time period, format, or place.
  • The complete list of collections can be sorted by title or subject with the option to view the full collection description.
The Learning Page aims to serve as the "front door" to American Memory's collections for teachers. In addition to dozens of teacher- and classroom-tested lesson plans, there are featured activities, examples of how to use the collections to develop critical thinking skills, and professional development opportunities, including "self-serve workshops" and tutorials. Teachers can also read "The Source" online newsletter with practical teaching tips, sign up for news alerts about American Memory or participate in monthly live, thematic "chat" sessions (archived transcripts are available as well).

Users can search across all collections or limit their search to specified collections by topic. Results can be displayed in two forms: the default list view (with links to the item and corresponding collection) or gallery view, with clickable thumbnail prints (or when not available, title with link).

An early leader in OAI adoption, LC makes item-level metadata available from American Memory, the Global Gateway and the Prints & Photographs Division's Online Catalog. This includes, for example, all records for LC's moving image materials included in the Moving Image Collections (Johnson 2006). Helpful background documents and guidelines for prospective OAI harvesters are available from the About page (see Technical Information). [[168]] Records are harvestable as sets organized by content type; when more than one set exists, there is the option to harvest individual sets or the combined set. As of May 2006, LC lists the following available records:

  • Books (11 individual sets, combined set)
  • Ephemera, Pamphlets (1 set)
  • Maps, Atlases (1 set)
  • Photos (26 individual sets, combined set)
  • Posters (2 individual sets, combined set)
  • Other Still Visual (4 individual sets, combined set)
  • Motion Pictures (1 set )
  • Sheet Music (1 set)
A number of aggregation services under review in this report harvest LC records (e.g., OAIster, Perseus, Sheet Music Consortium, American West, MetaScholar, DLF Aquifer, DLF MODS Portal). In addition RLG Cultural Materials (subscription resource) and RLG Trove.net (a free service associated with RLG Cultural Materials) both harvest LC's OAI metadata. LC will utilize the OAI protocol to update a centralized Virtual International Authority File (VIAF) currently under development by OCLC, Die Deutsche Bibliothek, and LC. Intended to serve the international cataloging community, VIAF will include records for Personal Names from selected national libraries (Arms 2003). [[169]]

4.4.5 Sheet Music Consortium (SMC)

Update Table 22: Sheet Music Consortium based on DLF Survey responses, Fall 2005
Sheet Music Consortium
ORGANIZATIONAL MODEL4 Partners: UCLA, Indiana, Johns Hopkins, Duke
Also harvest from: LC, Nat'l Library of Australia, Maine Music Box
SUBJECTHumanities: Music
FUNCTIONOAI aggregator of sheet music.
PRIMARY AUDIENCEGeneral collection aimed at both general public and academic community.
SIZE110,000 (10% increase)
USEPer month: 3,909 visits (average)
ACCOMPLISHMENTS1. Functional development complete & site officially published.
2. Addition of new harvested collections: NLA and Maine Music Box
CHALLENGES1. Digital collections that are potential targets are not OAI compliant.
2. Incompatible metadata standards
TOOLS OR RESOURCES NEEDED1. Easy to use software tools that would allow collections to become OAI compliant.
GOALS OF NEXT GENERATION RESOURCE1. Harvest additional collections.
2. Possible addition of sound recordings
3. Enriched metadata in order to provide better retrieval service.

Intended to leverage the research potential of digital sheet music collections, the Sheet Music Consortium has added two collections-National Library of Australia and the Maine Music Box-to its aggregation since 2003 and is currently adding two more collections from the University of Colorado, Boulder and from the University of Missouri, Kansas City. The Library of Congress and the National Library of Australia have full digital images associated with the metadata records, whereas Indiana University and Duke have a mix of bibliographic metadata and digitized images. For sheet music published after 1922 (and therefore likely under copyright protection), UCLA provides access to the sheet music cover but not the sheet music itself. The Maine Music Box estimates that 62 percent of its collection is in the public domain. For items still under copyrto the sheet music cover but not the sheet music itself. The Maine Music Box estimates that 62 percent of its collection is in the public domain. For items still under copyright (from 1931 forward), the Maine Music Box does not display images of the score or sound files.

Current data providers are listed below along with the number of metadata records [[170]]:

The SMC Web site lists more than 60 institutions that provide some type of public access to digital sheet music collections. Nevertheless, the SMC's aggregation from a mere seven collections contains many more examples of sheet music than other search engines or union catalogs are able to retrieve. SMC could fill a void if it succeeds in attracting more members into the consortium and developing into a full-scale, sophisticated community of practice.

Table 25: Results for search for "sheet music"
>25,000 items from 45 collections
IMLS Content & Collections Registry
>1,000 items from 10 collections
40 expert-selected resources (does not cite the SMC)
66 robot-selected resources (includes the SMC)
>20,000 bibliographic records
>16,000 sheet music with document type: scores
40 sheet music with document type: computer files
178 sheet music with document type: sound recordings
291 sheet music with document type: Internet resources
Sheet Music Consortium
>110,000 records from 7 collections

Regrettably, SMC does not provide collection descriptions, current harvesting statistics, or details about the number of records, such as those with bibliographic metadata that also have associated digitized images. It is possible, however, to limit searches to digitized sheet music only. The absence of collection-level descriptions is unfortunate since several contributing entities represent multiple special collections from different libraries. For example, the Maine Music Box is an aggregation of five collections drawn the Bagaduce Music Lending Library and the Bangor Public Library.

The Sheet Music Consortium's user interface has not changed since 2003. It supports both basic and advanced searches, including limiting queries to digitized sheet music only. The primary advantage of using the SMC is the ability to search across multiple collections, coupled with the functionally that permits users to select records, add annotations and save (or email) items to a virtual collection that can be shared with others or reserved for personal use.

The following table shows options for creators of Virtual Collections. Only owners of collections can delete them. Collections without owners will be deleted annually.

Levels of Access and Protection in Virtual Collections
Collection StateViewEdit
View / Edit
< Password 1 >
Requires Password 1Requires Password 1
View < Password 1 >
Edit < Password 2 >
Requires Password 1Requires Password 2

Source: http://digital.library.ucla.edu/sheetmusic/oaihelp.html

Future service enhancements-for example, distinguishing between composers and lyricists, providing access to descriptive elements like plate and publisher numbers, or specifying different types of dates-are hampered by limitations of the available metadata (Davison et al. 2003). As a result, the SMC offers sparse services when compared to the native environments of the constituent collections. Although SMC principals speculate about expanding to include other musical formats, they foresee "a danger in generalizing the service into to [sic] areas that may be better served by other means of discovery" (Ibid). Without any plans to enrich the legacy metadata or integrate SMC more fully into e-learning or e-research environments, SMC seems destined to remain an online union catalog of digitized sheet music with the potential of creating personal or shared virtual collections. While this does fill a need as discussed above, SMC might take a lesson from its partners and review features that they have implemented to develop a more ambitious vision of its future. PictureAustralia, for example, an aggregation that includes the NLA digital music collections, does incorporate different media and also permits discovery by theme.

Source: http://www.musicaustralia.org/

Duke's collection can be browsed by subject content type, illustration type, advertising, and decade (with topical categories). The Maine Music Box offers browsing by subject and sheet music cover art. Moreover, it offers the ability to listen to sound files and has created an instructional module with customized services. Still in its early stage of deployment, its developers believe that "it will take a new generation of music educators to use digital collections as instructional tools." Overall, they "would encourage a vision that provides tools for integrating sheet music collections with other digital libraries," especially promoting their relevance to social and cultural history. [[171]]

4.4.6 Heritage West (formerly Heritage Colorado)

Update Table 23: Heritage West based on DLF Survey responses, Fall 2005
(formerly Heritage Colorado)
ORGANIZATIONAL MODELFunded by Colorado Dept of Education, IMLS & NEH.
SUBJECTCultural heritage of the western U.S.
FUNCTIONCollaborative efforts of archives, historical societies, libraries & museums located in the western U.S. to make digital collections available to all online audiences.
PRIMARY AUDIENCEInterested public, educators, researchers, life-long learners.
SIZE77 participating institutions (51% growth); 18 institutional members
ACCOMPLISHMENTS1. Became a regional collaborative organization.
2. Complete redesign of the CDP Web site in Nov. 2005 w/ user-testing in 2006.
3. Revision/update of CDP Dublin Core Metadata Best Practices document.
CHALLENGESSustained funding.
GOALS OF NEXT GENERATION RESOURCEHopes to launch a new interface for delivery of digital content, enabling side-by-side comparison of digital objects, enable the use of METS records, and provide more interactivity for users.

Operating as a not-for-profit with 501c3 status since 2002, the Colorado-based Collaborative Digitization Program (CDP) has expanded its core goals-(1) to achieve high quality digital access to cultural heritage collections and (2) to provide resources and training to create digital surrogates of primary source collections-beyond the borders of Colorado to work with partners across the western United States, including Arizona, Colorado, Kansas, Montana, Nebraska, Nevada, New Mexico, Utah, and Wyoming. CDP members (21 as of April 2006) pay an annual fee ranging from $100 to $2,500 based on their institution's operating and collection budgets. In 2005-06, CDP began to award member institutions with vouchers for free participation in CDP-sponsored workshops or on-site training by CDP staff. CDP carries out its work under the aegis of a Board of Directors, four staff members, and six working groups (Digital Collections, Digital Audio, Digital Imaging, Digital Preservation, Technology, and Metadata).

Source: http://www.cdpheritage.org/collection/

The re-designed Web site offers a multitude of options to meet the needs of varied users from searching CDP's two major collections (Heritage West and Colorado Historic Newspapers) to reading about upcoming CDP workshops, reviewing "Best Practices," linking to "Lesson Plans," or viewing a "Member Spotlight." The "Digital Toolbox" incorporates best practices in digital imaging, Dublin Core metadata, and digital audio; offers information about workshops; and connects to project management guides. "The Teacher Toolbox" is organized into three areas: Why Primary Sources? (links to other primary source sites geared to teachers such as American Memory's Learning Page), Lesson plans, and Professional development.

Heritage West (formerly Heritage Colorado) offers users the ability to conduct unified searches across the digital collections of 77 participating libraries, museums, archives, and historical societies. The new user interface supports basic and advanced searches, as well as searches by topical category. For example, in advanced search mode, users can limit their query to seven collections (comprising the original Heritage Colorado collection, the Denver Public Library and Colorado Historical Society's photographs and images collection, and the five components of the "Western Trails" collection-from Colorado, Kansas, Nebraska, Utah and Wyoming). Search results can be sorted by author, title or date, and saved for emailing. The results are also summarized according to the collection from which they are derived, offering an alternative means of accessing the items.

The Colorado Historic Newspaper Collection (CHNC), CDP's other major database, currently covers 86 newspapers (291,000 digitized pages) published in English, German, Spanish or Swedish in 46 cities and 34 counties throughout the state of Colorado from 1859 to 1928. New material is added on a monthly basis. After extensive user testing, CHNC launched a new search interface in November 2005. It enables users to search newspapers by region within the state and allows them to create a customized group of newspapers for searching. In December 2005, CHNC received a Library Services and Technology Act (LSTA) Continuation Grant from the Colorado State Library that will allow them to partner with the Denver News Agency to run six workshops for educators about the use of historic and current newspaper content in teaching.

As part of the IMLS-funded IMLS Digital Collections and Content gateway, the University of Illinois helped CDP to become OAI-compliant in 2003. OAIster now harvests more than 32,000 items from CDP. While CDP is a collaboration success story, it faces tough decisions about how best to federate searching across multiple databases and whether or not to maintain its own customized software system (DC Builder) or migrate to a commercial solution (Bailey-Hainer and Urban 2004). Reports about CDP are available at its Web site, including a recent presentation by Koelling and Shelstad (2006) summarizing CDP's experience with "Collaborative Digitization Programs."

4.4.7 The American West

Update Table 24: American West based on DLF Survey responses, Fall 2005
The American West
ORGANIZATIONAL MODELSponsor: William & Flora Hewlett Foundation. Lead institution: California Digital Library. Project partners: CDP (e.g., Heritage West), Harvard, Indiana, LC, Michigan, Virginia, U of Washington.
SUBJECTCultural Heritage
FUNCTIONBuild a virtual collection on the American West through metadata harvesting and investigate its viability as a tool to assist information resource providers like librarians to better leverage digital content for their specific audiences.
SIZEApproximately 250,000 digital objects.
USESite not yet released.
ACCOMPLISHMENTS1. Developing a prototype harvest infrastructure.
2. Ability to ingest metadata-only records into a repository.
3. Concrete steps in developing metadata normalization/enrichment tools.
CHALLENGES1. Need a better understanding of the needs of audience(s) for OAI-harvested metadata aggregations.
2. Need easier-to-use tools for re-mediating and enhancing harvested metadata.
3. Need clearer use scenarios to drive continued development of OAI aggregation services.
TOOLS OR RESOURCES NEEDEDWidely available metadata normalization tools and tools supporting surfacing topical cohesiveness across highly heterogeneous aggregated collections (could be repository-defined or individual user defined collections).
GOALS OF NEXT GENERATION RESOURCEThis is an R&D project so "next generation" goals are uncertain.

The American West (AmWest) is an experimental project to build a regionally and thematically-focused test bed of OAI-harvested metadata contributed by multiple institutions. Led by the CDL, the AmWest collection has an estimated 250,000 objects contributed by eight partners including the California Digital Library (CDL), the Collaborative Digitization Project, the Library of Congress, Harvard University, and four other university libraries (Indiana, Michigan, Virginia and Washington). Built on the basis of user needs articulated in a series of assessment workshops, AmWest intends to serve a diverse audience ranging from University of California and community college faculty to academic librarians, K-12 teachers, and public librarians. [[172]] In particular, it aims to develop tools to configure and integrate virtual collections with local personalized content as well as develop the capacity to deliver learning objects via various platforms such as WebCT. [[173]]

The project's user assessment reports offer a wealth of insights into the behaviors, needs and expectations of different user groups, while also identifying common ground. Key findings from user interviews resulted in the following recommendations:

  • Create separate gateways for classroom teaching vs. scholarly research
  • Develop interactive features to encourage learning and exploration
  • Support advanced search and filtering
  • Allow users to create and publish personal views of the collection
  • Longer term, encourage users and institutions to contribute local collections (CDL 2004)
User input helped to refine the broad topical categories that will form the basis of the site's hierarchical faceted browsing schema. As evident from the Table below, this resulted in numerous modifications to the proposed schema. For example, three categories were added to avoid over-reliance on "Society & Culture" as a "catch-all" category: Family & Community, Leisure & Travel, and Work & Labor. In other instances, categories were revised for precision: Arts became Arts & Architecture, Environment became Land & Resources, and Exploration & Migration became Westward Movement.

Table 26: American West's Broad Topic Categories
Proposed Broad Topic CategoriesRevised Broad Topic Categories
1. Agriculture
2. Arts
3. Business & Industry
4. Education
5. Exploration & Migration
6. Government & Politics
7. Military & War
8. Native Americans
9. Environment
10. Race & Ethnicity
11. Religion
12. Science & Technology
13. Society & Culture
1. Agriculture
2. Arts & Architecture (revised)
3. Business & Industry
4. Education
5. Family & Community (added)
6. Government & Politics
7. Land & Resources (revised)
8. Leisure & Travel (added)
9. Military & War
10. Native Americans
11. Race & Ethnicity
12. Religion
13. Science & Technology
14. Society & Culture
15. Westward Movement (revised)
16. Work & Labor (added)

Source: Adapted from Appendix I, Poe 2005, 11 http://www.cdlib.org/inside/assess/evaluation_activities/docs/2005/survey_May2005_report.pdf

The principal investigators have also carried out preliminary work on metadata enhancement to support topical clustering and faceted browsing. Given the extensive amount of pre-processing and human intervention involved in enriching the metadata, they propose that further experimentation-perhaps by the DLF Aquifer Project-is required to determine the optimal balance between collaborative and local responsibilities to facilitate automated classification upon ingest of harvested records and reduce the labor-intensive process of clustering to arrive at targeted topical terms (Landis 2006).

4.4.8 DLF Aquifer

Update Table 25: DLF Aquifer based on DLF Survey responses, Fall 2005
DLF Aquifer
ORGANIZATIONAL MODELCollaboration among subset of DLF membership.
SUBJECTAmerican culture and life
FUNCTIONTo build and test library services that can be integrated into a variety of local environments
PRIMARY AUDIENCEAcademic Community
STATUSUnder development
SIZEUnder development
USESite not yet released.
ACCOMPLISHMENTS1. Receiving strong support from the DLF Board in the form of a dedicated staff member.
2. Developing processes for distributed, collaborative work.
3. Making an implementation plan and beginning to execute it.
CHALLENGES1. Resources: Difficult for participants to carve out time for this collaborative effort.
2. Diversity of expectations: Participant libraries are interested in emphasizing different facets of the project.
3. Flat organizational structure: DLF is a lean organization, which is both an advantage, allowing the initiative to test the limits of the network and a possible limit.
TOOLS OR RESOURCES NEEDEDOutside funding that would allow dedicated project staff. Support for service model development to evaluate organizational effectiveness and to plan for sustainability.
GOALS OF NEXT GENERATION RESOURCEExperiment with methods of aggregation other than metadata harvesting. Enable "deep sharing," the ability to move digital objects from domain to domain, (e.g., modifying and re-depositing them in a different location in the process.)

Leveraging the quality digital content developed by the Digital Library Federation (DLF) Libraries in American culture and life, the DLF Aquifer is a collaborative project, open to all DLF members, with fourteen current participating institutions, to build an open distributed library. DLF Aquifer will create a test bed of middleware tools and services to support the needs of digital library developers and scholarly end-users alike. To this end, Aquifer has four working groups (Collections, Metadata, Technical Architecture, and Services) along with a coordinating implementation group that sets policy. To date, Aquifer has completed a:

N.B. Link may be unstable due to experimental nature of this site.

(Adapted from Kott et al. 2005)

The key findings of the institutional survey along with corresponding Aquifer service responses are outlined below:

Table 27: Aquifer Institutional Survey Findings and Service Responses
Key FindingsAquifer Service Response
Use of digital collections and services is often assessed at point of introduction or update, rather than systematically over time.Developing an assessment model that can capture the nature of scholarly practice and the long-term integration and use of digital services and resources.
Searching is the most common way that digital collections are used.Developing tools and services that support meta-searching.
Metadata standardization is the most commonly reported strategy for supporting digital collectionsDeveloping middleware tools that support metadata management activities such as migration, taxonomy assignment, and metadata enrichment.
Institutions and users desire cross-resource discovery tools and greater ability to personalize service options.Developing tools and services that enable and enhance the integration of digital content into course management systems.
Budgetary, time and personnel constraints challenge the ability of institutions to develop needed services.Pursuing collaborative collection development to leverage scarce resources. In addition, the other responses above will help to alleviate some of these constraints by supplying models and tools for needed services.

Source: Adapted from DLF-Aquifer Services Institutional Survey Report 2006, Executive Summary: 3-4.

Integral to this report is an annotated list of other user assessment instruments developed by DLF institutions, such as the American West surveys discussed above. These assessment activities are grouped into the following broad categories: Metadata Harvesting and Searching Portals, Collection Aggregation and Display, Navigating and Using Digital Object Collections, and Collecting and Analyzing Usage Data. Together they provide a strong foundation to inform future research about user services in the context of collaborative digital library development.

Three phases are envisioned to roll-out Aquifer service development priorities:

  • Phase 1: Leveraging institutional infrastructure
    • Metadata harvesting (via OAI-PMH)
  • Phase 2: Enhancing
    • Finding (known item/faceted searching via SRU/W)
    • Metadata remediation
    • Metadata enhancement
    • Taxonomy assignment
    • Browsing
    • Collecting
  • Phase 3: Deep sharing
    • Exporting
    • Searching full text
    • Integration with course management systems
    • Annotation
    • Focused crawling
The University of Michigan is hosting the DLF Aquifer portal; it tests out the MODS harvesting for DLF Aquifer collections. As of this writing, the Aquifer prototype Web site contains some 24,000 MODS metadata records contributed by the Library of Congress and Indiana University's Digital Library Program. Eventually, the collection will consist of 250,000 items representing a wide spectrum of media ranging from datasets and images to manuscripts and sheet music. The DLF Aquifer portal is intended to serve as an "administrative" portal, designed as a place for digital library developers to learn more about the DLF Aquifer collections and the richer metadata MODS harvesting provides.

Source: OAIster http://oaister.umdl.umich.edu/o/oaister/ (April 27, 2006)

Source: DLF Aquifer http://www.hti.umich.edu/a/aquifer/ (April 27, 2006)

Developed from OAIster, the Aquifer portal features user interface improvements, including thumbnails; additional fields to search (e.g., language and institution); an additional resource type (e.g., dataset) and SRU functionality. Next steps include date normalization and subject clustering. Aquifer is also experimenting with another innovation-"asset action package"-designed "to support a consistent user experience and deeper level of interoperability across collections and repositories" (Kott et al. 2006). This allows multiple views of resources in an OAI context. In practice it enables users to deploy locally-available tools (e.g., for image manipulation, annotation, and saving) with disparately-held content from other repositories that use "asset actions." [[174]]

4.4.9 SouthComb

Update Table 26: SouthComb based on DLF Survey responses, Spring 2006
URL not yet available
See http://www.metascholar.org/
for project forerunners.
ORGANIZATIONAL MODELMixed model, currently comprised of grant funding and support from Emory University. Affiliated with Emory University's Robert W. Woodruff Library and the MetaScholar Initiative.
FUNCTIONPortal for Southern Studies research providing cross-resource search tools that harvest, automatically classify, and meta-search information combined from multiple resources (Web, OAI, and others).
PRIMARY AUDIENCEAcademic Community
STATUSUnder development as of May 2006.
SIZESite not yet released.
USESite not yet released.
ACCOMPLISHMENTSThis new service builds on achievements of prior work:
1. Refinement of metasearching, semantic clustering, and metadata assignment techniques (MetaCombine and Quality Metrics projects).
2. Development of a conspectus of Southern Studies digital archives.
3. Creation of Southern Spaces peer-reviewed Internet journal and its editorial board.
CHALLENGES1. Metadata format inconsistencies, particularly in describing the resource.
2. Metadata inadequacies, often leading to over-reliance on keyword searching.
3. Sustainability of service: managing the transition from project to ongoing program.
TOOLS OR RESOURCES NEEDEDMetadata normalization tools (some will be developed or deployed in this project).
GOALS OF NEXT GENERATION RESOURCEAs an expansion and improvement on Emory's previous OAI endeavors, SouthComb itself represents a next-generation metasearch service.

Leveraging an impressive series of digital initiatives to advance scholarly communication carried out under the umbrella of "MetaScholar," Emory University received funding in March 2006 from The Andrew W. Mellon Foundation to develop a "capstone" project that would encompass and build on previous metadata harvesting efforts. Tentatively named SouthComb, the project aims to create a sustainable interdisciplinary search portal targeted to Southern Studies research. SouthComb will implement all of the experimental techniques that Emory has developed for harvesting, automatically classifying, and metasearching information from OAI repositories, Web pages, and other scholarly resources. It will establish an advisory panel of Southern Studies scholars at various universities to review and select resources, thus allowing it to develop sub-portals tailored to meet the needs of particular academic programs. It also aims to widen the cadre of scholars contributing to Southern Spaces-an innovative, peer-reviewed Internet journal and scholarly forum created by Emory University. [[175]] SouthComb intends to advance regional collaboration by advising partner institutions in the use of OAI-PMH tools, such as the Metadata Migrator developed at Emory [[176]], and by extending participation in the MetaArchive preservation network. [[177]] To sustain its efforts over the long-term, SouthComb expects to adopt a hybrid business model, consisting of a freely available basic Web resource and a more sophisticated version with advanced features, available on a cost-recovery basis via institutional subscriptions.

According the principal investigators lessons learned and tools developed during previous MetaScholar Initiative projects have contributed significantly to the ability to construct SouthComb. [[178]] They note the following building blocks:

As such, SouthComb will emerge as Emory University's next-generation OAI-based service, one that improves the quality of metadata records and the ease of searching and browsing across heterogeneous resources.

User studies conducted for the AmericanSouth project revealed a common desire to search across multiple resources-a finding echoed in other studies of the demands of interdisciplinary research. A hindrance to providing such a tool, however, was the lack of controlled or consistent subject vocabularies for many of these resources. The MetaCombine project, through its focus on semantic clustering techniques, sought to remedy the obstacles to subject browsing of such heterogeneous materials. In the process the project discovered that focused crawling of Web sites (selectively crawling only Web sites that are relevant to the subject domain under consideration) greatly improved the quality of the results. The harvested results could then be classified according to semantic similarities and organized into taxonomies for easier browsing. These searching techniques, combined with newly developed systems for assigning metadata and visually displaying conceptual connections among records, form the core of the new SouthComb search system. [[180]]

Other features of the SouthComb search system, developed during the MetaScholar Initiative's Study of User Quality Metrics project, will allow researchers greater precision in identifying resources that are most useful to their work. Using focus-group data on how scholars in the sciences, social sciences, and humanities actually search for and identify quality digital resources for their work, the Quality Metrics project built a new prototype search system that permits scholars to discover resources using both explicit attributes (such as title, author, and other data that currently appear in library records) and implicit attributes (such as citations in journals, usage information from logs, and number of times included in electronic reserves-latent indicators of the scholarly value of a resource). Which resource attributes are highlighted for Southern Studies researchers depends considerably on communications among scholars and the librarians and archivists who provide access to those resources and on focused conversations with scholars about ways they use resources.

The synergistic opportunities continue through the SouthComb portal itself, particularly in its connection with Southern Spaces. As a foray into peer-reviewed digital scholarship, the Internet-only journal Southern Spaces has re-imagined the possibilities for digital publishing. Through gateways, events and conferences, interviews and performances, and essays that capture the Internet's multimedia potential, the journal's content models the types of scholarly products possible through digital collections and fuels innovations in digital scholarship.

Source: Mary Odem, Global Lives, Local Struggles: Latin American Immigrants in Atlanta. Available from http://www.southernspaces.org/contents/2006/odem/1a.htm

SouthComb hopes to achieve long-term sustainability by providing scholars with the resources they most need and desire. To that end, Emory's digital library has already constructed a Southern Digital Archives Conspectus (SDAC) that describes and provides access to the library- and museum-produced open access digital collections currently available on the topics of history, literature, and culture in the U.S. South from the Colonial Period to the present. [[181]]

Table 28: Southern Digital Archives Conspectus Classifications (with # of associated collections)
Agriculture and Industry in the American South (37)
Art and Architecture in the American South (52)
Education in the American South (61)
Environment in the American South (32)
Ethnicity in the American South (32)
Folk Art in the American South (8)
Folk life in the American South (17)
Foodways in the American South (4)
Gender in the American South (29)
Geography in the American South (37)
History, Manners, & Myth in the American South (228)
Language in the American South (4)
Law and Politics in the American South (44)
Literature in the American South (29)
Media in the American South (8)
Music in the American South (22)
Race in the American South (63)
Recreation in the American South (20)
Religion in the American South (22)
Science and Medicine in the American South (11)
Social Class in the American South (20)
Urbanization in the American South (26)
Violence in the American South (26)

Source: http://southconspectus.library.emory.edu/SPT--BrowseResources.php (May 2006)

This survey identifies unique collections that would be of great interest to Southern Studies scholars as well as gaps in the digital landscape that could inform future digitization and harvesting efforts. Building SouthComb is conceived as an on-going exercise in community identification and collaboration, leading to greater community investment in digital access to resources, digital scholarship, and digital preservation.

4.4.10 Perseus Digital Library

Update Table 27: Perseus Digital Library based on DLF Survey responses, Fall 2005
Perseus Digital Library
ORGANIZATIONAL MODELTufts University, Classics Depart. w/ NEH, NSF, & other public-private funders
FUNCTIONEvolving digital library of resources for the study of the humanities.
PRIMARY AUDIENCEInterested public
SIZE1.1 million manually-created and 30 million automatically generated links connect the 100 million words and 75,000 images. 850,000 reference articles provide background on 450,000 people, places, organizations, dictionary definitions, grammatical functions and other topics. [[182]] N.B.: Corpus comprised of <2,000 texts.
USEApril 2005: served more than 11 million pages to more than 400,000 unique users. [[183]]
ACCOMPLISHMENTS1. Rebuilt Perseus text system & released new Web site (Perseus 4.0).
2. Active development of named entity recognition system for historical texts.
3. Improved cataloging of resources including exploration of new standards (MODS, FRBR, etc.).
CHALLENGES1. Meeting needs of growing audience w/ limited resources, including providing adequate user support.
2. Ability to maintain current services while also implementing research agendas.
3. Implementing a digital preservation strategy.
TOOLS OR RESOURCES NEEDEDExploring various open source tools to support automatic metadata generation, automatic ingestion of digital objects, and improved object relations management systems.
GOALS OF NEXT GENERATION RESOURCEEventual release of a named entity browser. Implementation of a distributed editing environment. N.B. Available as of May 2006 at http://www.perseus.tufts.edu/hopper/nebrowser.jsp

In May 2005, the Perseus Digital Library (PDL) released version 4.0 of its system software, facilitating interoperability and closer alignment with Web standards, including support for distributed catalog services based on MODS/MADS/SRU/OAI. The new technology offers capabilities such as:

  • Extraction of well-formed XML fragments of primary sources with full TEI-conformant markup permitting developers to create their own front ends;
  • Hierarchical (FRBR) catalog (Mimno et al. 2005);
  • Discrete XML services for morphological analysis, tables of contents, chunking of larger documents into smaller units, and various categories of searching; and
  • Clearer and readily documented API. Tools. [[184]]
Integral to the new technology plan, PDL ushered in another fundamental change: namely the migration of its core data to the Tufts Institutional Repository-a Fedora object-based architecture better suited to its long-term preservation and access, thereby allowing PDL to concentrate on research and development activities. According to the new arrangement, once PDL research applications have proven their viability, they will move to the institutional repository's production server.

In recent months, PDL has also released significant new content related to 19th-century American documents, taking advantage of new technologies and services (Crane 2006). The user interface permits navigation through texts by "chunking" documents by chapters, parts, pages or tables of contents, and automatically extracts salient places, people and dates for immediate viewing.

At the time of this writing, PDL's content appears to be betwixt and between the original Web site and the new release. [[185]] It is difficult to correlate the collection overlap (5 at the original site, 4 at the new site) because they are identified by different names.

Table 29: Collections at original and new release of Perseus Digital Library (April 10, 2006)
COLLECTIONS http://www.perseus.tufts.eduTotal WordsTextsSecondary SourcesMuseum PhotographyTools
Classics: Greek, Latin, Archaeology52,817,8334891121668
Duke Databank of Documentary Papyri3,796,4762751
English Renaissance: Shakespeare, Marlowe11,294,934806
London: Bolles Collection13,517,917351
American Memory: California12,799,122186
American Memory: Upper Midwest16,248,751140
American Memory: Chesapeake6,937,628142
Tufts History since 1852771,11411
Boyle Papers: History of Science285,35747
COLLECTIONS http://www.perseus.tufts.edu/hopper/collections
Duke Databank of Documentary Papyri3,791,687
Germanic Materials758,202
19th-century American56,140,360

As evident from the screenshot below, when a particular text is selected, relevant Places, People and Dates are automatically extracted and linked (right-hand frame). The text can also be navigated by chapters and table of contents from the left-hand frame.

Source: http://www.perseus.tufts.edu/hopper/ (April 10, 2006)

Developers interested in the evolving technology piloted at PDL should refer to Crane (2006) and related publications at the Web site. In addition to the eventual release of a named entity browser, the principal investigators are researching the implementation of a distributed editing environment whereby users may correct errors, comment on topics, create custom commentaries, user guides, discuss issues with other users, and personalize the Perseus experience.

4.4.11 NINES: Networked Interface for Nineteenth-Century Electronic Scholarship

Update Table 28: NINES based on DLF Survey responses, Fall 2005
Networked Interface for Nineteenth-Century Electronic Scholarship http://nines.org/
ORGANIZATIONAL MODELSponsored by ALA, ASA, NAVSA, NASSR, SHARP with headquarters at the U of Virginia.
FUNCTIONTo provide an online venue for aggregating peer-reviewed scholarly work in British and American literary and cultural studies in the 19th century; to develop a general model for such work; and to facilitate new scholarship using digital tools.
PRIMARY AUDIENCEAcademic community
STATUSReleased to the public in December 2005.
SIZEThe 2005 release aggregates : The Rossetti Archive, The Swinburne Project, Romantic Circles (in part), The Poetess Project, The Walt Whitman Archive, Additional releases are described below.
USENot yet available.
ACCOMPLISHMENTS1. The establishment of the editorial boards and steering committee.
2. The creation of the implementation design for aggregating materials located on distributed institutional servers.
3. The creation of high-level interpretive tools (Collex, Juxta, and IVANHOE) for use within the NINES environment
CHALLENGES1. Funding to sustain the developing infrastructure.
2. Funding to move paper-based journals that want to become part of the NINES project to online operations.
3. Getting major professional organizations -- in this case, MLA particularly -- to move into active sponsoring mode.
GOALS OF NEXT GENERATION RESOURCEMajor goals are to overcome those three major obstacles listed above.

Established in 2004, NINES is a scholarly collective to promulgate peer-reviewed digital scholarship in 19th-century cultural and literary studies, British and American. Headquartered at the University of Virginia under the leadership of Jerome McGann, (John Stewart Bryan University Professor and editor of the acclaimed hypertext project, The Rossetti Archive), NINES is sponsored by five scholarly societies:

  • NASSR: North American Society for the Study of Romanticism
  • NAVSA: North American Victorian Studies Association
  • ASA: the American Studies Association
  • ALA: the American Literature Association
  • SHARP: the Society for the History of Authorship, Reading & Publishing
More than a dozen other influential humanities computing centers, technical organizations, and digital humanities projects are affiliates.

Guided by a Steering Committee and three domain-specific Editorial Boards, NINES aims to (1) create a shared information management system to coordinate the process of submitting, peer-reviewing and certifying the integration of digital work into NINES and (2) develop a set of customized tools to facilitate knowledge discovery and interpretation (McGann and Nowviskie 2005, 12).

In December 2005, NINES launched a pilot implementation, comprising 24,975 peer‑reviewed digital objects, aggregated from six digital projects:

The June 2006 release adds another 13,000 new objects, incorporating: A later release, scheduled for fall 2006, will bring in another 30,000 new objects from: The Whistler Correspondence and The Dickinson Electronic Archives. Consultation is underway to integrate other online resources into the aggregation. All contributions are vetted through NINES editorial apparatus prior to their release. The technology described below, enables users to browse and search collections; registrants can collect and annotate selected search results. http://www.emilydickinson.org/ http://www.whistler.arts.gla.ac.uk/correspondence/

The NINES technology plan has evolved from a centralized, hierarchical approach requiring compliance with a monolithic set of governing standards for text markup, metadata, interface, and archiving to a more flexible, collaborative and non-hierarchical design, relying on RDF (Resource Description Framework) syntax to facilitate description and semantic integration of NINES resources. NINES uses a customized open-source indexing system and the Lucene search engine, customized to integrate faceted browsing. COLLEX, a tool developed by NINES, serves as the backbone of the system that "brings this indexing and search design framework into a collaborative research environment" (Ibid, 15). COLLEX "leverages current developments in folksonomy and semantic-web technology to perform data mining operations and enhance knowledge discovery," . . . leading scholars and students "to see connections among digital objects, based on the contexts into which those objects have been placed (implicitly or explicitly) by past scholarly activity in the system" (Ibid, 15-16). Users of COLLEX can:

  • 1. collect, tag, analyze, and annotate trusted objects (digital texts and images vetted for scholarly integrity);
  • 2. reorganize and publish objects in fresh critical perspectives;
  • 3. share these new collections with students and colleagues, in a variety of output formats; and,
  • 4. without any special technical training, produce interlinked online and print exhibits using a set of professional design templates. (Ibid, 16) [[186]]
Two screenshots from the COLLEX prototype appear below. The first presents the initial view into the NINES browser with COLLEX sidebar incorporated. Users see the featured NINES exhibits at the top of the screen, have access to the most popular folksonomy tags, and have access to faceted browsing and search. The second screenshot is a view of the system after some constraints have been introduced in the faceted browser. More specifically, it depicts browsing a user-created tag ("reflection") in the sidebar.

Source: Screenshots provided by NINES' developers (May 8, 2006)

Source: Screenshot provided by NINES developers (May 8, 2006)

In addition, NINES has developed two other interpretative applications that can be used in tandem with COLLEX or independently. Juxta is a collation and text comparison tool with analytical visualization capability (released for testing in February 2006). IVANHOE is a collaborative interpretative play space, especially designed for pedagogical use [[187]] (McGann 2005).

NINES aims to increase participation in digital scholarship by awarding competitive fellowships during the summer to train scholars who are developing digital projects and by working with journal editors to facilitate the migration of paper-based journals to digital or hybrid formats. As of this writing, NINES is seeking grants to extend its development for another two years and also hopes to gain endorsement from the Modern Language Association of America.

4.4.12 Current Issues and Future Directions

Synergies among these services are apparent in terms of sharing collections (Heritage West/American West; DLF Aquifer/American West/Southcomb), metadata schema (Cornucopia/IMLS DCC), user interface and search systems (IMLS DCC /DLF Digital Collections Registry), and tools development (American West/DLF Aquifer/SouthComb). In these and other ways, this cohort collaborates and builds on each other's work, directly or indirectly. They represent, however, different models of achieving organizational sustainability.

The more complex services, such as DLF Aquifer, SouthComb, and NINES must engage at least three different communities of practice: scholarly and disciplinary circles; digital library technical domains; and e-learning and/or e-research service communities. For these aggregation services to flourish over the long-term, they have to be cognizant of the needs and trends across all three sectors. NINES, for example, garnered support from a host of relevant scholarly societies, humanities digital computing centers and other digital libraries in addition to establishing editorial boards charged with peer-review oversight for content. While it is scholar-led, the service itself is embedded in a library setting at the University of Virginia. DLF Aquifer, on the other hand, operates in a decentralized manner where participants agree to terms spelled out in a business plan, collection development policy and other technical specifications. It is driven by the DL community and loosely informed by scholars, with the intention of building prototype tools and services that can be applied at different institutions to meet their particular needs. SouthComb has a multi-faceted organizational structure, with scholars taking the lead for some components, such as the journal Southern Spaces, while the library develops the tools and finding system. The library and scholars work in tandem to identify new content and bring it into the aggregation. A challenge facing all three of these collaborations is how to achieve a reputation of sufficient stature that other scholars and libraries are willing to contribute their time (for example, peer review or tool development), content, scholarship, and financial resources outside their local institutional setting. In short, are the benefits of collaboration, fruits of cooperative labor, and reward system adequate to carry the day?

It is important to acknowledge that virtually all of the services under review in this section play a major role in empowering other data providers to achieve interoperability through OAI implementation and promulgation of best practices. Projects like Cornucopia, IMLS DCC, and Heritage West provide constituent services with the tools they need to maintain control over their own information environments while also fostering their ability to contribute to aggregations. In this way, they have helped to increase the quantity and quality of data providers.

Representatives from many of these services (e.g., IMLS DCC, American Memory, American West, SouthComb, DLF Aquifer) are directly involved not only in developing the "Best Practices for OAI Data Provider Implementations and Shareable Metadata," (a joint DLF and NSDL initiative), but also in creating the means to achieve consistent and adequate metadata. Nevertheless, many of them note the unmet challenge of having sufficient automated and semi-automated tools at their disposal to enhance and remediate metadata for scholarly use. A particular challenge and focus of activity among these services is devising methods to achieve subject classifications, thematic groupings or topical clustering across large, heterogeneous collections. In July 2006, the Digital Library Federation and The Andrew W. Mellon Foundation are sponsoring "The Metadata Enhancement and OAI Workshop" at Emory University where DL specialists will examine automated and semi-automated strategies for metadata enhancement and remediation scenarios involving the OAI protocol. Some of the scenarios being considered include normalization of date and format fields and taxonomy generation/assignment. The workshop will result in an agenda for specific experiments to assess various scenarios collaboratively, especially as part of the DLF Aquifer project.

Other common challenges revolve around standardization of terminology, multilingual metadata and search support, and aligning collections with their associated items in meaningful contexts.

Future generation plans include new user interfaces that enable side-by-side comparison of documents, more interactive features and interpretative tools, the capacity to move complex digital objects from domain to domain, and the ability to migrate core data to production sites where preservation services are also available.

Finally, funding strategies (aside from grand and foundation funding) to ensure long-term viability is common concern for these services. Heritage West has a membership fee structure. DLF Aquifer's business plan includes a provision to consider fee-for-service components after its initial development and SouthComb will make some of its services available through institutional subscriptions. Ultimately, the longevity of this cohort rests on how well it meshes with pedagogical practices, e-scholarship, and lifelong learning pursuits.

4.5 User Alchemy: Discover, Deliver, Divine

First Choice for Information-by College Students across all Regions

"Which source/place would be your first choice?

Search engines72%
Library (physical)14%
Online library10%
Bookstore (physical)2%
Online Bookstore2%

Source: OCLC 2006, A-20.

The services under review in this section are all attempting to distinguish themselves from generic but hugely popular search engines by customizing their approach to meet the needs of the academic community. From niche search engines to customized and "accessorized" portals, they are components of evolving finding systems that move beyond discovery to the delivery and re-use of digital content. They represent a progressive spectrum of solutions to integrating search results from federated searching to metasearch systems, differentiated by "just-in-case processing" versus "just-in-time processing" (Sadeh 2006).

The review starts with Scirus, a federated search service which has increased its coverage significantly since 2003, by extending Web crawling and indexing to a much broader array of subjects. It moves on to Infomine, a collaboratively developed index and catalog of expert-selected and robot-retrieved Internet resources. Next, a service from the UK, "Intute," along with the transition to this new name, hopes to become a trusted, first-choice, Internet "mentor" and filter for quality information. More than a search engine, Intute is embedding its resources and services into a variety of teaching and research environments. Finally, the California Digital Library's Metasearch Initiative represents a coordinated and multi-faceted digital infrastructure that integrates all resources-irrespective of origin, host location or protocol-into user-controlled service environments. The later two projects, not yet fully deployed, show the promise of how various standards and best practices come together in service to the academic community.

4.5.1 Scirus

Update Table 29: Scirus based on DLF Survey responses, Fall 2005
Scirus for Scientific Information Only
SUBJECTSciences [and other scholarly information]
FUNCTIONMultidisciplinary search engine, focusing on science
PRIMARY AUDIENCEResearch Community
SIZECrawls over 217 million science-related pages, consisting of 179 million Web pages, as well as 38 million records from both proprietary & OAI-compliant sources (including journals, institutional repositories, patents, e-prints from arXiv, technical reports from NASA, etc.)
USEPer day: > 115,000 searches.
ACCOMPLISHMENTS1. Significant increase in the size and variety of content types in Scirus's index.
2. Improvements in indexing process and content classification
3. Improvements in user interface

Scirus [[188]]. Elsevier's award-winning search engine [[189]], continues to grow in content, types of information, and functionality. "How Scirus Works" (updated in August 2004) describes its process of gathering and classifying data into its index; it also explains search functionality, ranking and search refinement. [[190]] Scirus uses a combination of focused Web crawling, based on a "seed list" of URLs manually checked for scientific content, and database loads from its partners (e.g., ScienceDirect, MEDLINE, LexisNexis) and OAI harvesting (e.g., from arXiv.org, RePEc, NDLTD). As of early March 2006, it boasted more than 250 million Web pages with the majority derived from educational institutions; the slowest growth in representation is from commercial sites.

Table 30: Scirus Web Page Counts by Domain (March 17, 2006)
August 2003March 2006
.edu sites45 million83 million
.com sites18 million22 million
.org sites14.8 million25 million
.ac.uk sites5.5 million10 million
.gov sites4.7 million6.5 million
Other STM and university sites around the worldOver 40 millionOver 68 million

Source: Scirus "about" page.

Since 2003, Scirus has augmented considerably its journal and "preferred Web sources" content from a combination of subscription-based (e.g., Crystallography Journals Online, Institute of Physics, Scitation) and freely available OAI sources (e.g., PubMed Central, DiVA, MIT OpenCourseWare, NDLTD, RePEc). (It has also dropped several sources including Beilstein Abstracts and its own-Elsevier-Chemistry, Mathematics and Computer Science Preprint Archives. These are available on a subscription basis via Chemweb, Elsevier's Chemistry portal and other Elsevier portals.)

While its coverage is strongest in the sciences (especially its journal sources), Scirus's subject scope now expands across all disciplines due to the inclusion of multidisciplinary content from ETDs, academic OAI repositories, and broader Web crawls. The category below, "Digital Archive" currently consists of records from Organic eprints and the UMDL. Elsevier expects this category to expand significantly in the course of the year. Scirus now also indexes news sources and offers news results in a dedicated section at the bottom of the results page. The feature includes news items form the last 30 days and ranks results by relevance and date. Up-to-date news from the New Scientist is also available directly from as a link off the home page.

Table 31: Titles and Record Counts of Scirus's Proprietary and OAI Sources (March 17, 2006)
Journal Content with Number of Full-Text Articles (or Citations)Preferred Web sources (e-prints, technical reports, ETDs, patent data, course materials)
BioMed Central: 6,515
Crystallography Journals Online: 56,310
Institute of Physics: [207,000]
MEDLINE/PubMed: 15.2 million citations
PubMed Central: 285,500
Project Euclid: 28,510
ScienceDirect: 5.6 million
Scitation: 318,760
SIAM (Society for Industrial & Applied Mathematics): 7,300
arXiv.org: 311.065
Caltech: 3,058
DiVA: 1,500
Cogprints: 2,175
MIT OpenCourseWare: 33,050
NASA: 12,265
NDLTD: 149,381
Patent Offices data from esp@cenet (European Patent Office) and the US Patent and Trade Office or via LexisNexis platform: 13 million
RePEc: 163,800
University of Toronto T-Space: 2,080
Digital Archives [[191]]: Organic eprints [4,360],
UMDL (University of Michigan Digital Library) [198,000]

Source: Sources are from search categories available from the Advanced Search page. Record counts are from the "about" page. Figures in brackets and additional information about patent sources are from email correspondence with Sharon Mombru on March 17, 2006.

Users can perform a search within or across three broad categories: all journal sources, preferred Web sources, or other Web sources. (Scirus indexes Web pages and their relationships, classifying the content by subject and information type through utilization of a collection of dictionaries with more the 1.6 million scientific terms, pattern recognition tools, and linguistic analysis. This enables users to limit searches by eight different information types and twenty subject areas as well as six file format types.

Table 32: Scirus Delimiters: Information Types, File Types, and Subjects (March 17, 2006)
Information Types
Company homepages
Scientist homepages
File Types
Agricultural and Biological Sciences
Chemistry and Chemical Engineering
Computer Science
Earth and Planetary Sciences
Economics, Business and Management
Engineering, Energy and Technology
Environmental Sciences
Languages and Linguistics
Life Sciences
Materials Science
Social and Behavioral Sciences

Source: Scirus Advanced Search page

Searches can be narrowed to particular authors, journals or titles and restricted to specified date ranges. A sample query for journal articles published in the "Institute of Physics" with the keyword "laemmli," returns results with the search term highlighted (in this case, it appears among the article's references) and clearly indicates the source of the published article.

Predicting the function of eukaryotic scaffold/matrix attachment regions via DNA mechanics
Ming Li / Zhong-can Ou-Yang, Journal of Physics: Condensed Matter, Aug 2005
...near future to elucidate how universal our hypothesis is. References [1] Freeman M 2000 Nature 408 313 [2] Paulson J R and Laemmli U K 1977 Cell 12 817 [3] Phi-Van L and Strätling W H 1988 EMBO J. 7 655 [4] Levy-Wilson B and Fortier C 1989 J. Biol. Chem... Published journal article available from

view all 3 results from Institute of Physics Publishing

similar results

Scirus automatically performs "intelligent query rewrites," suggests "did you mean?" queries, and lists alternative keywords to refine or expand searches. Search results are returned according to relevance ranking (determined by an algorithm that takes into account word location and frequency as well as the number of links to a page) or date. Users can refine, customize or save searches and email of export selected search results to their reference management application.

An Advanced Search for:

EXACT PHRASE <avian influenza>


ALL THE WORDS <pandemic>

Is automatically rewritten as a Basic Search for:

<"avian influenza" AND (pandemic)>

It retrieves 14,769 total results including 595 journal results, 29 preferred Web results, and 14,145 other Web results. Terms to refine the search are located in the right-hand margin. Several sponsored links follow from commercial suppliers of products for avian flu protection.

According to Elsevier representatives, "Scirus indexes sources of STM-relevance in the broad sense of the world-scientific, technical, medical, social sciences, etc." [[192]] With its more expansive subject scope, Scirus may need to revise its qualifier from "for scientific information only" to "for scholarly information only." The Scirus toolbar and customizable search query boxes (for general searches or limited by subject and other fields) can be added to external Web sites. The Scandinavian aggregator of academic repositories, DiVA, (which is harvested by Scirus) for example, offers users three search options at its Web site, including the ability to use the Scirus search engine (restricting the query to DiVA content, preferred Web sources, or all of the scientific Web). [[193]]


Update Table 30: INFOMINE based on DLF Survey responses, Fall 2005
ORGANIZATIONAL MODELUC-Riverside and national network of libraries w/ IMLS and NSDL funding
FUNCTIONVirtual library of expert and machine-gathered scholarly Internet resources
PRIMARY AUDIENCEAcademic community
SIZE210,000 resources (110% increase) of which an estimated 17,000 have associated full-object representation
USEAverage successful requests for pages per day: 2,190. Per month: 66,795. Per year: ~800,000
ACCOMPLISHMENTS1. Populating the database with robot records, mostly created from the iVia virtual library crawler and machine-generated metadata (using iVia classifiers).
2. Using new versions of iVia open source Software that has increased the accuracy of its classifiers and focused crawlers.
3. Collaborating and sharing metadata with other projects.
CHALLENGES1. Continued funding of programmers and metadata specialists.
2. Sustaining an active level of participants in the INFOMINE collecting cooperative in various subject areas.
3. Increasing the level of expert & robot records in the collection of INFOMINE Scholarly Internet Resources.
TOOLS OR RESOURCES NEEDEDNew versions of improved classifiers and crawlers to help scale with the increase of scholarly Internet resources.
GOALS OF NEXT GENERATION RESOURCE1. More customizable features for INFOMINE users.
2. Expanding subject areas.
3. Harvesting and sharing metadata with other digital and virtual libraries.
4. Providing more, rich full-text.
5. Continue to improve the iVia open source software.

As reported in 2003, INFOMINE is a national collaborative project led by the University of California-Riverside (UCR) to create a virtual library of scholarly Internet resources, utilizing the open source iVia software platform. INFOMINE provides access to more than 200,000 freely available and commercial resources organized into nine subject areas and covering a wide spectrum of media formats. The INFOMINE database represents a hybrid approach to collection building, relying on a combination of contributions from library subject specialists and focused Web crawling. As a result, searches can be limited to "expert-selected" or "robot-selected" items. Expert-selected content constitutes less than 20 percent of the total content. [[194]]

The flexible, modular system is designed to facilitate cooperative collection-building of the centralized database while also providing institutions with the tools they need to develop customized Internet resource discovery systems with local branding and incorporation of proprietary materials. MyInfomine (also known as MyI) supports building sub-collections of INFOMINE and enables contributors to create MyInfomine Categories, add records to these categories, and perform searches on them. [[195]] For example, librarians at the University of California-Riverside have created MyI categories to create course-specific Internet resource guides as well as to track medical indexes and databases.

iVia (http://ivia.ucr.edu) is an open source system for automatically and semi-automatically building library-related metadata and rich text collections of Internet available resources. Web-based virtual libraries like INFOMINE, subject portals and catalogs benefit. The codebase (250k lines of C++) has been designed by and for librarians and computer scientists at the University of California, Riverside, the NSDL< Library of Congress, Cornell University Library and others. The goal of the software is to amplify expert effort in collection building and foster collaboration.

iVia supports automated metadata generation to assign Library of Congress Subject Headings and LC Classifications to resources (Mitchell et al. 2004, Paynter 2005). Building on iVia, Data Fountains, currently under development, identifies itself as "a national, cooperative information utility for shared Internet resource discovery, metadata application and rich, full-text harvest of value to Internet portals, and library catalogs with portal-like capabilities." It uses expert-guided and focused crawlers supported in semi-automated (requires expert refinement) and fully automated modes (Mitchell 2006). The expert (or manually) guided crawler drills down from a given URL, whereas the focused crawler uses techniques of co-citation and similarity analysis to identify intensely interlinked and high value resources in a subject. The "Nalanda iVia Focused Crawler" features an "apprentice learner" program that enables it to follow the most promising links; crawling is also improved by utilizing a combined HITS and PageRank algorithm (Ibid).

In January 2004, iVia received a two-year sub-contract from NSDL to integrate iVia software into NSDL's Core Integration efforts. This has enabled NSDL to harvest item-level metadata from iVia's server for selected NSDL collections that did not include detailed metadata. By October 2004, NSDL reported that they successfully submitted a URL to iVia's Expert Guided Crawl Service and reviewed the results to delete inactive or irrelevant sites, then harvested the metadata using OAI. [[196]] Phipps, Hillmann and Paynter (2004) discuss NSDL's service interaction with INFOMINE, enabling "loosely-coupled third party services to provide metadata enhancements to a central repository."

INFOMINE's advanced user interface supports searches that can be restricted by fields (author, title, subject, keyword, description, full text, and MyInfomine) or by broad subject area as well as restricted by source (expert-created or expert- and robot-created), access (all, free, or fee-based), and type (e.g., article databases, datasets, patents, preprints and working papers, etc.). Users can select the length, number, and order of the results' display. In addition they can browse within several indexes-including LC subject headings and classifications-by keyword, author, title, or what's new (entries added in the last 20 days). Although users can submit comments about resources, there are no other post-processing functions such as saving, downloading, or emailing search results.

INFOMINE's search tips are exemplary and include a succinct, yet extensive review of how to combine Boolean and proximity operators.

Table 33: Combining Boolean and proximity operators in INFOMINE
Search Statement
A and B or C
A or B and C
(A or B) and C
A or B and C and not D
C and not D and A or B
((A or B) and C) and not D
C and not D and A near3 B
Executed as
(A and B) or C
A or (B and C)
(A or B) and C
(A or (B and (C and not D)))
(((C and not D) and A) or B)
((A or B) and C) and not D
(C and not D) and ((A near3 B))

Source: Infomine Search Tips (April 2006)

Three sample queries to find resources relevant to "OAI-PMH," "metadata," and "access within four words of knowledge," show the wide variation in results retrieved from INFOMINE and other general metasearch and cross-archive search engines. Overall, INFOMINE's results are the narrowest (or most refined) but in some instances, such as the query to find resources relevant to OAI-PMH, they appear too limited. INFOMINE's coverage of metadata is stronger and it is the only search engine to support proximity operators. However, it shares the unsolved problem of duplicate entries with OAIster (and probably with the other services as well). This primitive exercise demonstrates the need for a more thorough study to better understand the strengths of these service entities (e.g., Scirus's coverage of journal articles). It also underscores the reason why users need a thorough understanding of the universe covered by these search engines (cum databases) and the need for "nutrition and ingredient labeling" as discussed elsewhere in this report and proposed by Péter Jascó in 1993.

Table 34: Comparative Search Results: INFOMINE, RDN, OAIster, Scirus & Google Scholar
<OAI-PMH><Metadata><access near4 knowledge>
1 expert-selected record (Emory University's MetaScholar Initiative)
10 robot-selected records
(including articles from D-Lib and Ariadne)
230 expert-selected records
584 robot-selected records
15 expert-selected records
78 robot-selected records including many duplicates from Knowledge Management Think Archive and MayoClinic.com
3 results (Grainger Engineering Cross-Archive search service, OAIster, and Project Euclid)
147 results
does not support proximity searching, phrase control or AND operator
<access knowledge> returns 468 entries
<access OR knowledge> returns 22,961
181 items including noticeable duplication
143,392 items
>128,700 from 3 institutions w/ "metadata" in name
does not support proximity searching
<access AND knowledge> retrieves 6,339
9 journal results
2,235 preferred Web results; when Hong Kong U of Science & Technology (HKUST) is excluded, results are reduced to 77 items
10,977 other Web results; without HKUST results are reduced to 6,766
2,263 journal results
16,729 preferred Web results
962,865 other Web results
does not support proximity searching
<access AND knowledge> returns:
107,589 journals results
69,138 preferred Web results
3,359,813 other Web results
1,420 items
266,000 items
does not support proximity searching
ALLTHEWORDS: <access knowledge> returns 1,830,000 results

4.5.3 Intute (formerly RDN-Resource Discovery Network)

Update Table 31: Intute based on DLF Survey responses, Fall 2005
Intute (as of mid-2006)
Resource Discovery Network (RDN)
(former name)
ORGANIZATIONAL MODELPartnerships: 8 universities as host institutions and more than 70 collaborators (educational and research organizations).
SUBJECTMultidisciplinary. Covers: Arts and humanities; social sciences; science, engineering, technology; and geography; health and life sciences.
FUNCTIONTo advance education and research by promoting the best of the Web through evaluation and collaboration. RDN's vision is to create knowledge from Internet resources and in doing so, enable people to fulfill their potential. RDN brings together the best Web sites for education and develops associated services to embed these resources in teaching, learning and research.
PRIMARY AUDIENCEAcademic community
STATUSAs of late 2005, RDN comprises of eight subject hubs. After a period of review, analysis and internal consultation, RDN aims to build upon and re-establish its position in the further and higher education environment and in the Internet information environment. To this end RDN will: · Move to a new organizational structure · Integrate hardware and software platforms · Introduce a more holistic performance measurement framework · Implement the outcomes of a strategic branding exercise and review of visual identity so an established service will move into a new mode of delivery in mid-2006, and into a third phase of evolution.
SIZENumber of records (as of July 2005): Altis 4,020; artifact 5,500; BIOME 30,700; EEVL 12,415; GEsource 8,400; HUMBUL 10,000; PSIgate 13,500; SOSIG 26,800.
TOTAL: 111,335
USEPer month: ~12 million pages served; ~740K Internet (RDN catalog) searches
ACCOMPLISHMENTS1. Launch of the GEsource World Guide service.
2. Additions and updates to the Virtual Training Suite of online Internet tutorials.
3. Creation of the RDN Executive at MIMAS, Manchester Computing, the University of Manchester and the start of the strategic change process for the RDN.
CHALLENGES1. Differentiating the service from search engines.
2. Embedding the service or parts of the service in VLEs and more widely in the learning, teaching and research process.
3. Understanding changing user needs in terms of subject searching, indexing, level of description of resource required.
TOOLS OR RESOURCES NEEDED1. Automatic metadata creations tools.
2. Visualization technology.
3. Text mining tools.
4. Cross-searching technologies.
5. Portal technology
GOALS OF NEXT GENERATION RESOURCEThe name of the next generation of the RDN will be Intute. The next generation Web site aims: - To establish Intute as the successor to the RDN and its Hubs, where existing users can find the services they know. - To attract and retain new users of Intute. - To differentiate Intute from search engines and gateways. - To convince students, researchers, academics, teachers, and librarians / intermediaries to use Intute to make intelligent use of the Internet for education and research. - To promote the people who create Intute and convey the concept of the service as authoritative mentor of the Internet.

In mid-2006 after re-structuring and re-branding, the former "RDN" débuted as "Intute," a new name combining Internet and Tutorial to connote the amalgamation of guided learning and online resource discovery. [[197]] As was the case with RDN, Intute is a free online service of high-quality Web resources for education and research, selected by a network of subject specialists. The new service consolidates RDN's eight subject gateways into four broad subject areas, bringing them together with a unified interface:

  • Arts and Humanities
    • Artifact: Arts and Creative Industries
    • Humbul: Humanities
  • Health and Life Sciences
    • BIOME
  • Science, Engineering and Technology
    • EEVL: Engineering, Mathematics and Computing
    • GEsource: Geography and the Environment
    • PSIgate: Physical Sciences
  • Social Sciences
    • Altis: Hospitality, Sports, Leisure and Tourism
    • SOSIG: Social Science Information Gateway
Intute aims to support interdisciplinary inquiry while still providing the same level of subject access via its domain-specific Internet resource catalogs. It serves as a resource base for integration into a variety of e-learning platforms (or VLEs-virtual learning environments) and discipline-specific portals, as evident by the incorporation of EEVL into the pilot engineering metasearch service, PerX (discussed in section 4.2.8 of this report). [[198]] Intute will be developed so it can be integrated more easily into institutional portals (and VLEs) whereby its resources/contents may re-emerge in customized subject-based portals, created according to local needs. The Intute Virtual Training Suite provides subject-based e-learning tutorials and resurrects the general training sequence intended to teach critical thinking skills, known as the Internet Detective.

Intute is a core service of JISC hosted by MIMAS (Manchester Information & Associated Services at the University of Manchester). Its operations adhere to policies and standards documented through a formal Service Level Agreement that covers: collection management policy; marketing and communications; strategic plan; technical integration plan; cataloging guidelines; and network services such as format conversion, printing, authentication and e-commerce. [[199]] JISC monitors and audits Intute's performance and produces quarterly service "trend data" consisting of statistical graphs charting the number of catalog searches, Web pages served, and HelpDesk inquiries. As of late April 2006, the data about RDN and its constituent services hubs is current to October 2005. [[200]] In the first of a two-part series in Ariadne, Hiom (2006) provides a "Retrospective on the RDN," along with a timeline of its milestones (http://www.rdn.ac.uk/projects/eprints-uk/). A future article will discuss the strategies underlying its transformation into Intute. [[201]]

4.5.4 California Digital Library (CDL) Metasearch Initiative

Update Table 32: CDL Metasearch Infrastructure Project based on DLF Survey responses, Fall 2005
CDL Metasearch Infrastructure Project
ORGANIZATIONAL MODELCalifornia Digital Library and UC campus libraries, partially grant-funded.
SUBJECTSpecific to each instance created via the infrastructure.
FUNCTIONBuild localized metasearch services tailored to a particular audience and/or need.
PRIMARY AUDIENCEAcademic community
STATUSUnder development
SIZEThe first instance will include harvested metadata from 35,000 OAI records, along with at least 5 licensed databases.
USENot yet available
ACCOMPLISHMENTS1. Significant progress in integrating vendor metasearch product (ExLibris's MetaLib) with CDL's Common Framework software infrastructure.
2. Creation of a prototype harvesting tool (based on OAIster), harvesting, and evaluation of both harvest and harvesting tool.
3. Establishment of an SRU-compliant gateway to OAI harvested metadata.
CHALLENGES1. Gaining a better, more specific understanding of user needs (and how needs may vary depending on the institution and/or type of user).
2. Translating the prototype(s) into a production service.
3. Supporting the service once it is in use.
TOOLS OR RESOURCES NEEDEDRobust, flexible, open source tools for metadata normalization and enrichment, Web crawling, indexing, and searching, and widespread implementation of protocols (e.g., SRU) by vendors.
GOALS OF NEXT GENERATION RESOURCETo enable the easy discovery of appropriate metasearch portals, or even to dynamically select the resources to be metasearched at the moment of query.

The California Digital Library (CDL) Metasearch Infrastructure Project aims to leverage CDL's experience over the past six years since it first deployed "SearchLight," by establishing an infrastructure that will enable UC campus libraries to create customized search portals for specific audiences and purposes. The metasearch infrastructure adheres to the principles and standards set forth in CDL's Common Framework, an open, services-oriented technical architecture that provides an integrating framework for a full-spectrum of library services, ranging from archival (where objects are stored locally, e.g., UC's Digital Preservation Repository), to metadata only (where only metadata is stored locally), to portals (where no data is stored locally). [[202]]

In addition to The American West metadata portal, discussed earlier in this report, the CDL has several other prototype portals in various stages of development: [[203]]

NSDL: In fulfillment of a NSF grant to build and enhance the NSDL, the Earth Sciences portal is geared to meet the needs of the UC geosciences community, and serve as an exemplar of integrating NSDL content into university library services. A pilot deployment of the Earth Sciences portal is being evaluated as of mid-May 2006.

  • SmartStart: Targeted to meet the needs of undergraduates and others outside of their area of expertise.
  • Discipline-Specific: The first deployment is targeted to meet the needs of with faculty and graduate students in European studies (Western, Central and Eastern Europe, including Russia). [[204]]
  • An important background document, "Integrating Information Resources: Principles, Technologies, and Approaches" (Christenson and Tennant 2005) summarizes findings from CDL's studies of user needs relative to integrated searches. They report that from a user perspective, metasearch tools must exhibit:
  • Speed and simplicity of the Internet search engines (Google).

Convenience of e-commerce (Amazon). Participants' Internet usage has set high expectations for a service-rich environment.

  • Reliability, authority and integrity of information resources that are trusted because of the brand they carry (whether imparted by a prestigious library, academic institution, professional society, or even a state education curriculum. (Christenson and Tennant 2005, 3)

The report also fleshes out the content discovery and integration principles that should inform the design of CDL's metasearch services.

Table 35: CDL's Metasearch Infrastructure Principles
Content Discovery PrinciplesIntegration Principles
1. Only librarians like to search, everyone else likes to find. [[205]].
2. "Good enough" is just that.
3. All things being equal, one place to search is better than two or more.
4. What is not searched is as important as what is.
5. Place services as close to the user as possible.
1. Integrate metadata whenever possible.
2. Exploit metadata similarities.
3. Honor metadata differences.
4. Offer appropriate methods to narrow the scope.
5. If you can't centralize metadata, centralize searching.

Source: Christenson and Tennant 2005, 4-6.

The authors then chart the strengths of five different methods of integration (e.g., ingesting, harvesting, Web crawling, syndicating, and metasearching) against the relevant integration principles, the conditions in which each method is the most appropriate, and the implementation obstacles.

Table 36: Metasearch Integration Methods and Practices
Enable Content Submission (Ingest)Harvest Metadata (OAI-PMH)Crawl Web SitesEnable Content Syndication (RSS)Enable Federated Queries (Metasearch)
Relevant integration principle(s)· All appropriate metadata stored internally in a common format uniformly applied· All appropriate metadata stored internally in a common format uniformly applied
Honor metadata differences
Offer appropriate methods to narrow the scope
Integrate metadata whenever possibleIntegrate metadata whenever possibleIf you can't centralize metadata, centralize searching
When is this method appropriate?· Local collection that will be locally accessed
Content is relatively stable
Resources available to provide rich native interface
· Need access to large collections you don't want to have in-house
Need a fast search
To provide search access to a targeted collection of web sitesProvide access to frequently updated content or news - current awarenessWhen metadata cannot be centralized
When it is too time consuming for users to access multiple resources separately
Resource discovery
When users will need to find "just a few good things"
When content is frequently updated
What are the obstacles?· May not want to have "ownership" responsibilities
Storage space (at a very large scale)
Mostly obstacles related to providing access:
Normalization of metadata
Duplication of records -aggregate providers
Varying levels of granularity amongst digital objects
Contextualizing results
Accounting for XML validation errors
Mostly obstacles related to providing access:
How should search results be presented? By individual web page? By web site, then by page?
At this point in time, still a limited number of resources in this format
Range of options yet to be fully explored
Lack of standards
Avoiding "lowest common denominator" interface - losing benefits of native interface(s)
Staff training
Maintenance time/costs
De-duping difficulties and vendor concerns about duplicate display
Vendor concerns about server overload (as target)
Contextualizing results
Inadequate or non-existent search result ranking

Source: Reprinted with permission of the authors. (Christenson and Tennant 2005, 7)

As the authors explain:

A suitably developed metasearching infrastructure can be used to provide a common interface to content integrated by any or all of these methods. Thus the standard metasearch application marketed by software vendors is but one piece of a robust metasearching infrastructure. Such an infrastructure must be capable of using each of the integration techniques identified in the above chart while providing a unified user interface to the whole. (Ibid, 7)
The schematic bellows depicts how users would access digital resources via different portals. They would also have a suite of tools readily available to manage citations and facilitate the re-use, manipulation, annotation, and integration of resources into teaching and research platforms (e.g., the Scholars Box). [[206]] Current under development "the Scholar's Box is a tool that gives users "gather/create/share" functionality, enabling them to gather resources from multiple digital repositories in order to create personal and themed collections and other reusable materials that can be shared with others for teaching and research. The Scholar's Box can currently perform the following functions:
  • Gather: From California Digital Library, amazon.com, google.com, NSDL, RSS feeds, METS (digital library), WWW, CDL's metasearch system, and the local file system.
  • Create: Data and metadata gathered, annotated, and organized into personal collections via drag and drop
  • Share: IMS-CP, OpenOffice.org Presentation or Text document, PDF, HTML, a METS document, a set of Endnote references, Chandler Parcel, or sent to a weblog via the Blogger API" (Raymond Yee [[207]]).
The CDL infrastructure relies on a combination of open-source and commercial solutions. For example, CDL chose Ex Libris's MetaLib to enable access to commercial databases, externally-managed resources, and the Melvyl online catalog. [[208]] MetaLib interoperates with the Metasearching Infrastructure that manages other components in the context of CDL's Common Framework.

Source: http://www.cdlib.org/inside/projects/metasearch/diagramCF.jpg . Used with CDL's permission.

4.5.5 Current Issues and Future Directions

The deployment of these services addresses some of the issues discussed earlier in this report in response to the "Amazoogle Effect," by providing users with the types of services they have come to expect. These efforts are abetted by the work underway at NISO to develop standards and specifications that enable search and retrieval across multiple platforms and vendors, and linking to "appropriate" resources through OpenURL resolution systems. This work is carried out by NISO's OpenURL Framework for Context-Sensitive Services (http://www.niso.org/committees/committee_ax.html) and the NISO Metasearch Initiative. The second initiative brings together three major stakeholder groups organized into three cross-sector task groups dealing with Access Management; Collection and Service Descriptions; and Search and Retrieval specifications (Hodgson, Pace and Walker 2006). The overall goal is "to move toward industry solutions NISO sponsored a Metasearch Initiative to enable:

  • metasearch service providers to offer more effective and responsive services;
  • content providers to deliver enhanced content and protect their intellectual property; and
  • libraries to deliver services that distinguish their services from Google and other free web services. http://www.niso.org/committees/MS_initiative.html)
Available as of July 2005, the NISO Metasearch XML Gateway (MXG) Implementers' Guide (version 0.3) describes the MXG protocol that enables service providers to expose their content and services to a metasearch engine. (Such a gateway has been implemented, for example, by Berkeley Electronic Press's ResearchNow portal, described in section 4.2.11).

The first set of deliverables and recommendations was presented at a NISO workshop in September 2005; these documents are available along with the workshop presentations at the NISO Metasearch Initiative Web site. Among the important recent developments, the NISO Z39.92-200x, Information Retrieval Service Description Specification, was released by the Collection and Service Descriptions for trial use through October 2006. "This standard defines a method of describing Information Retrieval oriented electronic services, including but not limited to those services made available via the Z39.50, SRU/SRW, and OAI protocols. The ZeeRex standard addresses the need for machine readable descriptions of services in order to enable automatic discovery of and interaction with previously unknown systems. It specifies an abstract model for service description and a binding to XML for interchange." [[209]]

Library service vendors, as active contributors to and beneficiaries of the NISO Metasearch Initiative, are entering the metasearch market and designing new applications based on layered architectures that are intended to consolidate information search results and meet user needs from "discovery to delivery," as exemplified by Ex Libris's new "Primo" metasearch architecture below.

Source: Webinar presentation, "Primo: an Exclusive Peek from Ex Libris," Tamara Sadeh, May 9, 2006. Reproduced with permission.

This architecture helps to "create a superb user experience layer, decoupled from back-office functions, separating data creation and maintenance from its discovery." The publishing platform enables libraries to leverage resources irrespective of source, enrich the data, and expose hidden collections. Meanwhile the user is presented with a system that recognizes him (Hello, John Smith) and with results that can be refined, extended, altered, and displayed in various ways, as exemplified by the two prototype screenshots below (Tamar Sadeh, Ex Libris Webinar, May 9, 2006).

Source: Webinar presentation, "Primo: an Exclusive Peek from Ex Libris," Tamara Sadeh, May 9, 2006. Reproduced with permission.

Source: Webinar presentation, "Primo: an Exclusive Peek from Ex Libris," Tamara Sadeh, May 9, 2006. Reproduced with permission.

The California Digital Library is uniquely positioned to develop its own system wide metasearch infrastructure, relying on a combination of locally developed, open source and proprietary tools and systems. A challenge for all academic libraries is evaluating the appropriate balance of components and services developed internally versus those they purchase externally. With more than 600 people from 29 countries participating in Ex Libris's early preview of PRIMO, it seems that many libraries have already begun to consider their options.

Finally, it is worth reiterating that metasearch goal in this context is not about simplified "one-stop-shopping," but about creating a distributed information environment that can deliver subsets of resources, services and tools to users according to their particular needs. [[210]]

return to top >>  << previous  next >>