Notes
Slide Show
Outline
1
Participating in the OCA: University of California Libraries
  • Robin L. Chandler
  • California Digital Library
  •  April 12, 2006
2
Major Points
  • Provide a snapshot of what it means to participate as an OCA member
    • Scope of project
    • Roles and responsibilities
      • library contribution
    • Resources required – costs to library
      • Unpack 10 cents/page model
    • Work being accomplished


3
OCA Vision….(according to me)
  • Book content files and metadata (out-of-copyright & public domain) are openly available for access and reuse by individuals and organizations
  • Services are built by third parties (commercial and non-commercial) supporting POD, audio books, web services, etc.….
  • Availability of information spurs innovation and research
4
UC Joining the Open Alliance
  • OCA provides a unique and timely partnership for the UC Libraries to digitize out-of-copyright books
  • UC Libraries supply content: 80 – 100 K books
  • California Digital Library (CDL) coordinates
  • Internet Archive (IA) provides scanning capacity, online access to content files, & long-term file management
  • Third-party funding supports IA production capacity to scan
5
Business Case for Participation
  • OCA provides UC Libraries opportunity to consider new business models:
  • Funds invested in licensing online collections of out of copyright materials could be reallocated to digital reformatting
  • Digital reformatting can help support our efforts to build shared print collections and seek to avoid cost in use of scarce shelving space
  • Investments in digitization of local materials creates access (for our patrons) to third-party materials not currently available to them


6
Test Case –
Debugging the Workflow
  • The UC Libraries are financially supporting the digitization of approximately  800  books from the collections of the UC Berkeley Mathematics Library
  • The work to scan these materials is nearly complete and the books are available through the IA’s Texts website  http://www.archive.org/details/texts


7
 
8
 
9
Establishing the Workflow

  • UC: Select, retrieve, inspect, transfer & return books to shelves
  • IA: queries Z39.50 catalog to metafetch bibliographic records
  • IA: Digitization: creating content files & metadata
    • Camera raw (color), JP2K, PDF, DjVu, OCR
    • Scandata.xml (and others)
10
Establishing the Workflow (cont’d)
  • IA: Quality control & (online) access
  • IA: Long-term management of OCA content files
  • UC: Library validation and ingest of UC Library metadata and content files for preservation storage
  • UC: Link Melvyl (local catalog) to IA content


11
Getting your feet wet…its not just about supplying books to feed the scanners
  • Locating space for two scanning station centers in UC system
  • Making infrastructure improvements
  • Developing a tracking database (MySQL)
12
Getting deeper in the water….
  • Developing pick-lists:
    • Leveraging strengths of union and local catalogs
    • Developing strategies to identify biographies (discovering the limits of supplied subject headings)
    • Minimizing duplication across multi-library collections
    • Managing “bound with” volumes and foldouts
  • Determining file formats to preserve locally
  • Developing ingest process: METS feeder
  • Planning storage requirements


13
IA Production Model
  • 10 cents a page per book
    • $ 2 million dollars = 20 million pages = 67 K books
  • Scribe capacity (scanning stations)
    • 500 pages / hr per scribe
    • 14 hours/ day per scribe
    • 7000 pages/ day per Scribe
    • 70,000 pages per day (10 scribes)
    • 350,000 pages per week (10 scribes)
    • 1000 books per week (10 scribes), if an average book is 350 pages
14
Working with OCA staff: primary contacts

  • Rick Prelinger, Acting OCA Director
  • Robert Miller, Director of Books
  • Stu Blair, Director of Software Engineering
15
UC Libraries Staffing
  • California Digital Library (CDL) project management staff (1.5 FTE)
  • UC systemwide coordinating team (15 members)
  • CDL Ingest / Preservation Repository Group       (approx 2.0  FTE)
  • CDL Catalog Group (approx .5 FTE)
  • Am Lit and US History Bibliographers (20 members)
  • UCB Library / Northern Regional Library Facility team (approx 2.0 FTE)



16
UC Infrastructure Improvements
  • Northern Regional Library Facility (managed by UC Berkeley)
    • 1 gb/ps data pipeline
    • 98 amps for 10 scribes
    • Restroom facility for 2nd shift
  • Southern Regional Library Facility (managed by UCLA)
    • Currently assessing needs, but predicting upgrades for network switches, fiber optic cable and need for increasing electricity capacity


17
Costs to the UC Libraries
  • Library providing funds:
    • Staffing throughout the project workflow
    • Physical space
      • Scanning centers (where Scribes are housed), book processing, queue storage (book trucks)
    • Terrabyte servers for preservation repository
    • Infrastructure upgrades
      • Electrical, networking
      • Facilities to accommodate two shifts (14 hour days)
18
 Reaching OCA October 2006 Milestone
  • OCA staff are enthusiastic, patient and creative partnering with libraries to achieve efficient production capacity and a quality product
  • April 10, 2006 first American Literature books from UC Libraries available on the OCA
19
 
20
 
21
 
22
 
23
Building Sustainability for OCA
  • Current focus: build digital content
  • Secondary objective: create deliverables to help build digital content in the long-term
    • e.g.OCA Working Group: Scribe Scanning Workflow
      • Requirements
      • Procedures
      • Case Studies / Workflows



24
Thank you
  • Please feel free to contact me at robin.chandler@ucop.edu