Spring 2006 DLF Forum: Melvyl Recommender Project, 19
3 April, 2006
Ranking
●Using built-in Lucene capability
●“Boosted” with circulation data
–~9 million UCLA circulation transactions
–September 1999 – May 2005
–Data from two systems:  Taos, Voyager
●“Boosted” with holdings data
–For 10 UC campuses, provided by OCLC
●
Idea is to boost certain documents to augment Lucene’s capability.  Just reordering, not affecting what is retrieved.

Boosts are calculated in advance using summary tables constructed from holdings and circulation data in mySQL.  Applied at query time, not index time, so we are able to switch at will and compare.

The UCLA circulation dataset particularly valuable, as we’ll see later in this discussion
 - retained anonymized but persistent patrons IDs

Massaging circulation data:
 - absorbed much more time than working out the algorithms.
 - diffs in data structure, numbering systems between the two data sets
 - needed to create linkages to the Melvyl records.

Holdings:
•Considered a set offered by RLG (FRBRized and linked by ISBN)...poor coverage.
•Obtained a set from OCLC linked by OCLC number....much better coverage.
•Weighed the use of World-cat wide vs. UC-wide....UC collections very different from WC-wide in terms of what is highly held