A hierarchical monothetic document clustering algorithm for summarization and browsing search results

  • Authors:
  • Krishna Kummamuru;Rohit Lotlikar;Shourya Roy;Karan Singal;Raghu Krishnapuram

  • Affiliations:
  • IBM India Research Lab, New Delhi;IBM India Research Lab, New Delhi;IBM India Research Lab, New Delhi;Indian Institute of Technology, Guwahati;IBM India Research Lab, New Delhi

  • Venue:
  • Proceedings of the 13th international conference on World Wide Web
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Organizing Web search results into a hierarchy of topics and sub-topics facilitates browsing the collection and locating results of interest. In this paper, we propose a new hierarchical monothetic clustering algorithm to build a topic hierarchy for a collection of search results retrieved in response to a query. At every level of the hierarchy, the new algorithm progressively identifies topics in a way that maximizes the coverage while maintaining distinctiveness of the topics. We refer the proposed algorithm to as DisCover. Evaluating the quality of a topic hierarchy is a non-trivial task, the ultimate test being user judgment. We use several objective measures such as coverage and reach time for an empirical comparison of the proposed algorithm with two other monothetic clustering algorithms to demonstrate its superiority. Even though our algorithm is slightly more computationally intensive than one of the algorithms, it generates better hierarchies. Our user studies also show that the proposed algorithm is superior to the other algorithms as a summarizing and browsing tool.