lClustering algorithms are similar
lAll algorithms start with same bag-of-words
representation
lPreferable to use algorithm that can assign multiple
topics per single document
lQuality of result is highly dependent on
preprocessing
l
lClustering is limited
lShort documents (or limited metadata) can be difficult
to categorize
lAll methods produce junk topics
lWhen to freeze topics?
l
lHuman input is required
lClustering algorithm is automated, but everything else
isn’t
lPreprocessing (tokenization and stopword removal) is
key
lNeed human to interpret topics and assign labels
lNeed to choose number of topics
l
l