Topic Detection by Clustering Keywords

  • Authors:
  • Christian Wartena;Rogier Brussee

  • Affiliations:
  • -;-

  • Venue:
  • DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider topic detection without any prior knowledgeof category structure or possible categories. Keywords are extracted and clustered based on different similarity measures using the induced k-bisecting clustering algorithm. Evaluation on Wikipedia articles shows that clusters of keywords correlate strongly with the Wikipedia categories of the articles. In addition, we find that a distance measure based on the Jensen-Shannon divergence of probability distributions outperforms the cosine similarity. In particular, a newly proposed term distribution taking co-occurrence of terms into account gives best results.