Idea is to boost certain documents to
augment Lucene’s capability. Just
reordering, not affecting what is
retrieved.
Boosts are calculated in advance using
summary tables constructed from holdings and circulation data in mySQL.
Applied at query time, not index time, so we are able to switch at will and compare.
The UCLA circulation dataset particularly
valuable, as we’ll see later in this discussion
-
retained anonymized but persistent patrons IDs
Massaging circulation data:
-
absorbed much more time than working out the algorithms.
-
diffs in data structure, numbering systems between the two data sets
-
needed to create linkages to the Melvyl records.
Holdings:
•Considered a set offered by RLG (FRBRized and linked
by ISBN)...poor coverage.
•Obtained a set from OCLC linked by OCLC
number....much better coverage.
•Weighed the use of World-cat wide vs. UC-wide....UC
collections very different from WC-wide
in terms of what is highly held