Dynamic de-duplication of bibliographic data for user services
Los Alamos National Laboratory, Research Library
DLF Forum, October 26 2004, Baltimore, MD
LANL De-duplication Problem
•LANL Research Library locally hosts a large data collection
–A&I databases: ISI Citation Databases, Inspec, BIOSIS, Engineering Index, …
–Full-text collections: Elsevier, Wiley, APS, IOP, …
•
•Duplicates in LANL data collection:
– amongst bibliographic records
– between bibliographic records and citations
– amongst citations
•
•De-duplication need:
–join records from several databases that describe the same work
–find works that cite a given work
•