|
1
|
- Kat Hagedorn
- University of Michigan
|
|
2
|
- OAI = Open Archives Initiative
- OAI-PMH = Open Archives Initiative Protocol for Metadata Harvesting
- OAIster = an SP that allows searching of almost all DP metadata; housed
at University of Michigan
- DP = OAI data provider
- SP = OAI service provider
|
|
3
|
- Inception in e-prints community
- Santa Fe Convention: result of 1999 OAI meeting
- Became the OAI-PMH
- Designed as a protocol that “develops and promotes interoperability
standards that aim to facilitate the efficient dissemination of content”
*
- Essentially, harvesting metadata
|
|
4
|
|
|
5
|
- Verbs allow communication among DPs and SPs
- Every DP must implement all 6 verbs
- Not all SPs (need to) use all 6 verbs
- Examples:
- http://www.hti.umich.edu/cgi/b/broker20/broker20?
verb=ListMetadataFormats
- http://sunsite2.berkeley.edu:8088/oaicat/OAIHandler?
verb=ListRecords&metadataPrefix=oai_dc
|
|
6
|
- DPs use commercial or hand-grown software implementing the OAI-PMH verbs
to make their metadata available to SPs
- SPs retrieve, or “harvest”, the metadata using harvester software and
those same OAI-PMH verbs, and use that metadata in a service
|
|
7
|
- Institutions interested in being DPs must have
- Um, well, metadata to share
- Some level of technical expertise to install DP software
- Administrative buy-in
- Institutions interested in being SPs must have
- Reason(s) for wanting to become an SP
- An infrastructure for developing a service using the harvested metadata
- Some level of technical expertise to install SP software (i.e.,
harvester)
|
|
8
|
- Treating it as a project, at least at first
- Developing a maintenance and sustainability plan
- Developing a collection development policy
- Devoting some amount of programming time to it
|
|
9
|
- What’s our strategy?
- We’re a bit different-- we harvest everything and use anything that has
a link to a digital object, whether freely available or restricted
- Other SPs may choose to be subject specific, format specific or any
other kind of specific
|
|
10
|
|
|
11
|
- Metadata varies widely
- Formats (dc, mods, mets, marc, qdc, olac)
- Exhaustive vs. bare minimum
- (Let’s just call a spade a spade, a lot of it is bad.)
- More on this from Jenn
- And also, XML and UTF-8 character errors
- About 6% of current repositories on OAIster have them
|
|
12
|
- Sample date values
- <date>2-12-01</date>
- <date>2002-01-01</date>
- <date>0000-00-00</date>
- <date>1822</date>
- <date>between 1827 and 1833</date>
- <date>18--?</date>
- <date>November 13, 1947</date>
- <date>SEP 1958</date>
- <date>235 bce</date>
- <date>Summer, 1948</date>
|
|
13
|
- Pie-in-the-sky: all DPs create perfect metadata
- But…reality is that there will always be cleaning
- We run metadata through a transformer
- Handles as much bad UTF-8 as it can
- Filters out records we can’t use
- Adds normalized metadata to fields can normalize
|
|
14
|
|
|
15
|
|
|
16
|
|
|
17
|
|
|
18
|
- Potential to make the harvested and cleaned metadata available again to
data providers, search engines, librarians, etc., for their use
- Pro: availability to a wider audience
- Con: Run the risk of complicating the simple harvesting model
|
|
19
|
- No time to show
- What other metadata formats provide
- What associated thumbnails offer
- What subject clustering looks like
- But the gist is that there’s a lot we can do with metadata, as long as
it
- is Available
- follows Best practices
- is used Consistently across the repository
- Ask details in the breakout sessions!
|