Dynamic de-duplication of bibliographic data for user services
Los
Alamos National Laboratory, Research Library
DLF
Forum, October 26 2004, Baltimore, MD
•Strategy:
–Batch processing
–Bibliographic key matching
–Complex heuristics
•
•Issues:
–Extensive processing time
–Scalability problem in light of
growing data collection
–Revision of heuristics requires
reprocessing of collection
–
•Explore alternative:
–On-the-fly de-duplication
–De-duplication approach that is
appropriate for citation matching
–Flexibility regarding revision
of matching approach
•
•
•
•