Which clustering do you want? inducing your ideal clustering with minimal feedback

  • Authors:
  • Sajib Dasgupta;Vincent Ng

  • Affiliations:
  • Human Language Technology Research Institute, University of Texas at Dallas, Richardson, TX;Human Language Technology Research Institute, University of Texas at Dallas, Richardson, TX

  • Venue:
  • Journal of Artificial Intelligence Research
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the author's mood, gender, age, or sentiment. Without knowing the user's intention, a clustering algorithm will only group documents along the most prominent dimension, which may not be the one the user desires. To address the problem of clustering documents along the user-desired dimension, previous work has focused on learning a similarity metric from data manually annotated with the user's intention or having a human construct a feature space in an interactive manner during the clustering process. With the goal of reducing reliance on human knowledge for fine-tuning the similarity function or selecting the relevant features required by these approaches, we propose a novel active clustering algorithm, which allows a user to easily select the dimension along which she wants to cluster the documents by inspecting only a small number of words. We demonstrate the viability of our algorithm on a variety of commonly-used sentiment datasets.