Notes
Slide Show
Outline
1
Automated Generation of METS Records for Digital Objects
  • Nate Trail
  • Library of Congress
  • ntra@loc.gov
2
Task:
  • From existing HTML for a multi-volume monograph, automatically convert to an XML (METS/MODS) structure.


  • Merge data from the Voyager records and the file system.
3
"Start"
  • Start:
  • A Dictionary of Painters, (an eight volume set).
  • Finish:
  • 50047442/default.html
4
Need:
  • Location of files
  • Identifier in metadata store
  • Object type
  • Business rules
  • (Location of HTML presentation)


5
Get the Voyager record

  • I used SRU and formatted the results as MODS:
  • bath.lccn=50047442 and result=mods


6
Filter for MODS only

  • This step could include error checking if there were SRU diagnostics or  other problems.
  •  Stylesheet: getMods.xsl
  • Result: Voyager Bib
7
Get Item Level Data
  • from the Voyager Oracle tables using JDBC, ESQL in an XSP page.
  • items query
8
Crawl the directories
  • for a full file listing and add to the MODS data:
  • Result: list of title's files
9
Legacy Presentation
  • If HTML presentation page is known, get the descriptive data:
  • Stylesheet: pilkHtmlGrab.xsl
  • Results: details from html
10
Create METS for Title
  • Includes relatedItem elements for each volume found on the file system.
  • Stylesheet: makePilkBibMETS.xsl
  • Results: TOC Mets
11
Create METS for Volumes
  • Each contains a relatedItem “host” pointing to the title level record.
  • Stylesheet: makePilkMETS.xsl
  • Results: volume METS (v. 7)
12
Display
  • METS objects can be plugged into the “LC Presents: Music, Theater and Dance” display:


  • TOC page: 50047442/default.html
  • Volume page: 50047442.02/default.html


13
"And now a diversion into..."
  • And now a diversion into how METS objects can be displayed in HTML:
14
Digital object display
  • We use XML data and XSL stylesheets to present views of our digital objects in HTML, merging the files, metadata, and display templates on the fly.
15
Digital object display (2)
  • We create profiles for digital objects, with behaviors and views based on their characteristics, and then read the METS files to determine where the parts of the object are, how to display them, etc.
16
Digital object display (3)
  • Each display is built using small tools that break the task into interchangeable parts. This enables us to enhance components without affecting the whole site.
17
Navigation
18
Page Turning
19
Bibliographic Data
20
XML Source
21
Data Setup
22
Breadcrumbs
23
Display
24
Conclusions
  •  This process can be used to bring legacy presentations into METS objects as SIPs for a repository.
25
Conclusions
  • Or it can be used to automate the ingest of page scans from barcoded books:
  • barcode directories
  • barcode to LCCN
  • Begin from SRU step
26
Questions?
  • Nate Trail – Library of Congress
  • ntra@loc.gov