17
Clustering Algorithms
Clustering
Different clustering algorithms use the same bag-of-words representation, but produce different results.
1
1
f
1
2
1
1
1
g
1
2
2
1
1
1
l
1
1
1
1
k
1
1
1
j
1
1
1
I
1
1
1
h
1
1
1
1
1
1
1
e
1
1
2
d
1
1
1
1
1
2
c
1
1
1
1
b
2
1
1
a
disease
war
disposession
reservation
navajo
chumash
maize
dust
use
land
hunting
cattle
rights
water
-Tells you which documents are semantically close
-Tells you which words are close to which documents (solves synonymy in IR)
NMF,   Latent Semantic Analysis (LSA)
- Determines the topics addressed in the collection
- Assigns multiple topics to each document
Probabilistic Clustering
(e.g. Topic Model)
- Groups the documents into K clusters (topics)
- Assigns one topic to each document
K-Means or Hierarchical Clustering
What it does
Algorithm
Environment
Native
Americans
MetaCombine used a scheme similar to LSA to do their topic clusters
The topic model was developed specifically for text problems