Spring 2006 DLF Forum: Melvyl Recommender Project, 15
3 April, 2006
Spelling Correction
●Chose index-based strategy
–“N-gram” speller from Lucene:
●“primer” => pri prim rime imer mer
●form query from n-grams
●retain top 100, rank by closeness to original word
–Modified in several ways
●adjust for transpositions and insertions
●use metaphones
●boost on word frequencies
●Tested successfully on Wikipedia and aspell datasets
Work done by Martin Haye.

Modifications:
 - adjust to recognize that transpositions and insertions are the most common errors
 - applied “double metaphone” algorithm used in aspell (captures similar words based on phonetics)
 - boosted based on word frequencies in index