Dynamic de-duplication of bibliographic data for user services
Los Alamos National Laboratory, Research Library
DLF Forum, October 26 2004, Baltimore, MD
Current LANL De-duplication Approach
•Strategy:
–Batch processing
–Bibliographic key matching
–Complex heuristics
•
•Issues:
–Extensive processing time
–Scalability problem in light of growing data collection
–Revision of heuristics requires reprocessing of collection
–
•Explore alternative:
–On-the-fly de-duplication
–De-duplication approach that is appropriate for citation matching
–Flexibility regarding revision of matching approach
•
•
•
•