Dynamic de-duplication of bibliographic data for user services
Los Alamos National Laboratory, Research Library
DLF Forum, October 26 2004, Baltimore, MD
Optimizing likelihood
•Optimize likelihood scores in function of the dataset
•Machine learning: create model that accords weights to fields of the key
•Librarians:
–Were presented with a total of 3,000 pairs of keys
–Had to decide whether or not both keys of a pair represented the same work
–Result: clearer cut-off point between matches and non-matches
DAVIS BJ ANN NY ACAD SCI 1964 121 404
DAVIS BJ ANN NY ACAD SCI 1964 2 404
yes
no