Tuesday, 4/11/06
Halbert - DLF 2006 Spring
20
Focused Crawling (cont.)
•Why build:
–needed something that worked
–need something unencumbered by IP
–need something easier for digital librarians to use
•Guided bootstrapping:
–Ability to utilize phrase/keyword lists
–Gleaning seeds through search engine (Google)
–Seeding through Open Directory (also for negative set)
–Seeding/training via OAI repositories
–Development of phrase lists with phrase finder
•