Interactive text document clustering using feature labeling

  • Authors:
  • Seyednaser Nourashrafeddin;Evangelos Milios;Dirk Arnold

  • Affiliations:
  • Dalhousie University, Halifax, NS, Canada;Dalhousie University, Halifax, NS, Canada;Dalhousie University, Halifax, NS, Canada

  • Venue:
  • Proceedings of the 2013 ACM symposium on Document engineering
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose an interactive text document method, which is based on term labeling. The algorithm asks the user to cluster the top keyterms associated with document clusters iteratively. The keyterm clusters are used to guide the clustering method. Rather than using standard clustering algorithms, we propose a new text clusterer using term clusters. Terms that exist in a document corpus are clustered. Using a greedy approach, the term clusters are distilled in order to remove non-discriminative general terms. We then present a heuristic approach to extract seed documents associated with each distilled term cluster. These seeds are finally used to cluster all documents. We compared our interactive term labeling to a baseline interactive term selection algorithm on some real standard text datasets. The experiments show that with a comparable amount of user effort, our term labeling is more effective than the baseline term selection method.