Topic Discovery from Text Using Aggregation of Different Clustering Methods

  • Authors:
  • Hanan Ayad;Mohamed S. Kamel

  • Affiliations:
  • -;-

  • Venue:
  • AI '02 Proceedings of the 15th Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cluster analysis is an un-supervised learning technique that is widely used in the process of topic discovery from text. The research presented here proposes a novel un-supervised learning approach based on aggregation of clusterings produced by different clustering techniques. By examining and combining two different clusterings of a document collection, the aggregation aims at revealing a better structure of the data rather than imposing one that is imposed or constrained by the clustering method itself. When clusters of documents are formed, a process called topic extraction picks terms from the feature space (i.e. the vocabulary of the whole collection) to describe the topic of each cluster. It is proposed at this stage to re-compute terms weights according to the revealed cluster structure. The work further investigates the adaptive setup of the parameters required for the clustering and aggregation techniques. Finally, a topic accuracy measure is developed and used along with the F-measure to evaluate and compare the extracted topics and the clustering quality (respectively) before and after the aggregation. Experimental evaluation shows that the aggregation can successfully improve the clustering quality and the topic accuracy over individual clustering techniques.