Spring 2006 DLF Forum: Melvyl Recommender Project, 19
3 April, 2006
Ranking
●Using built-in Lucene capability
●“Boosted” with circulation data
–~9 million UCLA
circulation transactions
–September 1999 – May
2005
–Data from two
systems:Taos, Voyager
●“Boosted” with holdings data
–For 10 UC campuses,
provided by OCLC
●
Idea is to boost
certain documents to augment Lucene’s capability.Just reordering, not affecting what is
retrieved.
Boosts are
calculated in advance using summary tables constructed from holdings and
circulation data in mySQL.Applied at
query time, not index time, so we are able to switch at will and compare.
The UCLA circulation
dataset particularly valuable, as we’ll see later in this discussion
- retained anonymized but persistent patrons
IDs
Massaging
circulation data:
- absorbed much more time than working out
the algorithms.
- diffs in data structure, numbering systems
between the two data sets
- needed to create linkages to the Melvyl
records.
Holdings:
•Considered a set offered by RLG (FRBRized and linked by
ISBN)...poor coverage.
•Obtained a set from OCLC linked by OCLC number....much
better coverage.
•Weighed the use of World-cat wide vs. UC-wide....UC
collections very different from WC-wide in terms of what is highly held