DClusterE: A Framework for Evaluating and Understanding Document Clustering Using Visualization

  • Authors:
  • Yi Zhang;Tao Li

  • Affiliations:
  • Florida International University;Florida International University

  • Venue:
  • ACM Transactions on Intelligent Systems and Technology (TIST)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Over the last decade, document clustering, as one of the key tasks in information organization and navigation, has been widely studied. Many algorithms have been developed for addressing various challenges in document clustering and for improving clustering performance. However, relatively few research efforts have been reported on evaluating and understanding document clustering results. In this article, we present DClusterE, a comprehensive and effective framework for document clustering evaluation and understanding using information visualization. DClusterE integrates cluster validation with user interactions and offers rich visualization tools for users to examine document clustering results from multiple perspectives. In particular, through informative views including force-directed layout view, matrix view, and cluster view, DClusterE provides not only different aspects of document inter/intra-clustering structures, but also the corresponding relationship between clustering results and the ground truth. Additionally, DClusterE supports general user interactions such as zoom in/out, browsing, and interactive access of the documents at different levels. Two new techniques are proposed to implement DClusterE: (1) A novel multiplicative update algorithm (MUA) for matrix reordering to generate narrow-banded (or clustered) nonzero patterns from documents. Combined with coarse seriation, MUA is able to provide better visualization of the cluster structures. (2) A Mallows-distance-based algorithm for establishing the relationship between the clustering results and the ground truth, which serves as the basis for coloring schemes. Experiments and user studies are conducted to demonstrate the effectiveness and efficiency of DClusterE.