OMES: a new evaluation strategy using optimal matching for document clustering

  • Authors:
  • Xiaojun Wan

  • Affiliations:
  • Peking University, Beijing, China

  • Venue:
  • SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Existing measures for evaluating clustering results (e.g. F-measure) have the limitation of overestimating cluster quality because they usually adopt the greedy matching between classes (reference clusters) and clusters (system clusters) to allow multiple classes to correspond to one same cluster, which is in fact a locally optimal solution. This paper proposes a new evaluation strategy to overcome the limitation of existing evaluation measures by using optimal matching in graph theory. A weighted bipartite graph is built with classes and clusters as two disjoint sets of vertices and the edge weight between any class and any cluster is computed using a basic metric. Then the total weight of the optimal matching in the graph is acquired and we use it to evaluate the quality of the clusters. The optimal matching allows only one-to-one matching between classes and clusters and a globally optimal solution can be achieved. A preliminary study is performed to demonstrate the effectiveness of the proposed evaluation strategy.