PowerPoint Presentation

Dynamic de-duplication of bibliographic data for user services

Los Alamos National Laboratory, Research Library

DLF Forum, October 26 2004, Baltimore, MD

Optimizing likelihood

•Optimize likelihood scores in function of the dataset

•Machine learning: create model that accords weights to fields of the key

•Librarians:

–Were presented with a total of 3,000 pairs of keys

–Had to decide whether or not both keys of a pair represented the same work

–Result: clearer cut-off point between matches and non-matches

DAVIS BJ ANN NY ACAD SCI 1964 121 404

DAVIS BJ ANN NY ACAD SCI 1964 2 404

yes