Notes
Slide Show
Outline
1
Lifecycle
…of OAI
…of DPs and SPs
  • Kat Hagedorn
  • University of Michigan
2
Funny acronyms
  • OAI = Open Archives Initiative
    • OAI-PMH = Open Archives Initiative Protocol for Metadata Harvesting
    • OAIster = an SP that allows searching of almost all DP metadata; housed at University of Michigan
  • DP = OAI data provider
  • SP = OAI service provider
3
OAI’s history
  • Inception in e-prints community
  • Santa Fe Convention: result of 1999 OAI meeting
  • Became the OAI-PMH
  • Designed as a protocol that “develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content” *
  • Essentially, harvesting metadata
4
(Kinda lame) OAI graphic
5
The verbs
  • Verbs allow communication among DPs and SPs
  • Every DP must implement all 6 verbs
  • Not all SPs (need to) use all 6 verbs
  • Examples:
    • http://www.hti.umich.edu/cgi/b/broker20/broker20? verb=ListMetadataFormats
    • http://sunsite2.berkeley.edu:8088/oaicat/OAIHandler? verb=ListRecords&metadataPrefix=oai_dc
6
Restating the obvious
  • DPs use commercial or hand-grown software implementing the OAI-PMH verbs to make their metadata available to SPs
  • SPs retrieve, or “harvest”, the metadata using harvester software and those same OAI-PMH verbs, and use that metadata in a service
7
Sharing involves…
  • Institutions interested in being DPs must have
    • Um, well, metadata to share
    • Some level of technical expertise to install DP software
    • Administrative buy-in
  • Institutions interested in being SPs must have
    • Reason(s) for wanting to become an SP
    • An infrastructure for developing a service using the harvested metadata
    • Some level of technical expertise to install SP software (i.e., harvester)
8
Being a DP or SP means…
  • Treating it as a project, at least at first
  • Developing a maintenance and sustainability plan
  • Developing a collection development policy
  • Devoting some amount of programming time to it
9
Example OAI workflow: OAIster
  • What’s our strategy?
  • We’re a bit different-- we harvest everything and use anything that has a link to a digital object, whether freely available or restricted
  • Other SPs may choose to be subject specific, format specific or any other kind of specific
10
First step: harvest the metadata
11
And first sticky wicket
  • Metadata varies widely
  • Formats (dc, mods, mets, marc, qdc, olac)
  • Exhaustive vs. bare minimum
    • (Let’s just call a spade a spade, a lot of it is bad.)
    • More on this from Jenn
  • And also, XML and UTF-8 character errors
    • About 6% of current repositories on OAIster have them
12
Example: metadata variation
  • Sample date values


    • <date>2-12-01</date>
    • <date>2002-01-01</date>
    • <date>0000-00-00</date>
    • <date>1822</date>
    • <date>between 1827 and 1833</date>
    • <date>18--?</date>
    • <date>November 13, 1947</date>
    • <date>SEP 1958</date>
    • <date>235 bce</date>
    • <date>Summer, 1948</date>
13
So, second step is to clean
  • Pie-in-the-sky: all DPs create perfect metadata
  • But…reality is that there will always be cleaning
  • We run metadata through a transformer
    • Handles as much bad UTF-8 as it can
    • Filters out records we can’t use
    • Adds normalized metadata to fields can normalize
14
Transformation yields…
15
Third step: make it available
16
Fourth step: get the digital object
17
Fifth step: use
18
Sixth step: vicious circle
  • Potential to make the harvested and cleaned metadata available again to data providers, search engines, librarians, etc., for their use
  • Pro: availability to a wider audience
  • Con: Run the risk of complicating the simple harvesting model
19
The ABCs to remember
  • No time to show
    • What other metadata formats provide
    • What associated thumbnails offer
    • What subject clustering looks like
  • But the gist is that there’s a lot we can do with metadata, as long as it
    • is Available
    • follows Best practices
    • is used Consistently across the repository
  • Ask details in the breakout sessions!