1
|
- Robin L. Chandler
- California Digital Library
- April 12, 2006
|
2
|
- Provide a snapshot of what it means to participate as an OCA member
- Scope of project
- Roles and responsibilities
- Resources required – costs to library
- Unpack 10 cents/page model
- Work being accomplished
|
3
|
- Book content files and metadata (out-of-copyright & public domain)
are openly available for access and reuse by individuals and
organizations
- Services are built by third parties (commercial and non-commercial)
supporting POD, audio books, web services, etc.….
- Availability of information spurs innovation and research
|
4
|
- OCA provides a unique and timely partnership for the UC Libraries to
digitize out-of-copyright books
- UC Libraries supply content: 80 – 100 K books
- California Digital Library (CDL) coordinates
- Internet Archive (IA) provides scanning capacity, online access to
content files, & long-term file management
- Third-party funding supports IA production capacity to scan
|
5
|
- OCA provides UC Libraries opportunity to consider new business models:
- Funds invested in licensing online collections of out of copyright
materials could be reallocated to digital reformatting
- Digital reformatting can help support our efforts to build shared print
collections and seek to avoid cost in use of scarce shelving space
- Investments in digitization of local materials creates access (for our
patrons) to third-party materials not currently available to them
|
6
|
- The UC Libraries are financially supporting the digitization of
approximately 800 books from the collections of the UC
Berkeley Mathematics Library
- The work to scan these materials is nearly complete and the books are
available through the IA’s Texts website
http://www.archive.org/details/texts
|
7
|
|
8
|
|
9
|
- UC: Select, retrieve, inspect, transfer & return books to shelves
- IA: queries Z39.50 catalog to metafetch bibliographic records
- IA: Digitization: creating content files & metadata
- Camera raw (color), JP2K, PDF, DjVu, OCR
- Scandata.xml (and others)
|
10
|
- IA: Quality control & (online) access
- IA: Long-term management of OCA content files
- UC: Library validation and ingest of UC Library metadata and content
files for preservation storage
- UC: Link Melvyl (local catalog) to IA content
|
11
|
- Locating space for two scanning station centers in UC system
- Making infrastructure improvements
- Developing a tracking database (MySQL)
|
12
|
- Developing pick-lists:
- Leveraging strengths of union and local catalogs
- Developing strategies to identify biographies (discovering the limits
of supplied subject headings)
- Minimizing duplication across multi-library collections
- Managing “bound with” volumes and foldouts
- Determining file formats to preserve locally
- Developing ingest process: METS feeder
- Planning storage requirements
|
13
|
- 10 cents a page per book
- $ 2 million dollars = 20 million pages = 67 K books
- Scribe capacity (scanning stations)
- 500 pages / hr per scribe
- 14 hours/ day per scribe
- 7000 pages/ day per Scribe
- 70,000 pages per day (10 scribes)
- 350,000 pages per week (10 scribes)
- 1000 books per week (10 scribes), if an average book is 350 pages
|
14
|
- Rick Prelinger, Acting OCA Director
- Robert Miller, Director of Books
- Stu Blair, Director of Software Engineering
|
15
|
- California Digital Library (CDL) project management staff (1.5 FTE)
- UC systemwide coordinating team (15 members)
- CDL Ingest / Preservation Repository Group (approx 2.0 FTE)
- CDL Catalog Group (approx .5 FTE)
- Am Lit and US History Bibliographers (20 members)
- UCB Library / Northern Regional Library Facility team (approx 2.0 FTE)
|
16
|
- Northern Regional Library Facility (managed by UC Berkeley)
- 1 gb/ps data pipeline
- 98 amps for 10 scribes
- Restroom facility for 2nd shift
- Southern Regional Library Facility (managed by UCLA)
- Currently assessing needs, but predicting upgrades for network
switches, fiber optic cable and need for increasing electricity
capacity
|
17
|
- Library providing funds:
- Staffing throughout the project workflow
- Physical space
- Scanning centers (where Scribes are housed), book processing, queue
storage (book trucks)
- Terrabyte servers for preservation repository
- Infrastructure upgrades
- Electrical, networking
- Facilities to accommodate two shifts (14 hour days)
|
18
|
- OCA staff are enthusiastic, patient and creative partnering with
libraries to achieve efficient production capacity and a quality product
- April 10, 2006 first American Literature books from UC Libraries
available on the OCA
|
19
|
|
20
|
|
21
|
|
22
|
|
23
|
- Current focus: build digital content
- Secondary objective: create deliverables to help build digital content
in the long-term
- e.g.OCA Working Group: Scribe Scanning Workflow
- Requirements
- Procedures
- Case Studies / Workflows
|
24
|
- Please feel free to contact me at robin.chandler@ucop.edu
|