A practical web-based approach to generating topic hierarchy for text segments

  • Authors:
  • Shui-Lung Chuang;Lee-Feng Chien

  • Affiliations:
  • Institute of Information Science, Academia Sinica, Taiwan, R.O.C.;Institute of Information Science, Academia Sinica, Taiwan, R.O.C.

  • Venue:
  • Proceedings of the thirteenth ACM international conference on Information and knowledge management
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is crucial in many information systems to organize short text segments, such as keywords in documents and queries from users, into a well-formed topic hierarchy. In this paper, we address the problem of generating topic hierarchies for diverse text segments with a general and practical approach that uses the Web as an additional knowledge source. Unlike long documents, short text segments typically do not contain enough information to extract reliable features. This work investigates the possibilities of using highly ranked search-result snippets to enrich the representation of text segments. A hierarchical clustering algorithm is then applied to create the hierarchical topic structure of text segments. Different from traditional clustering algorithms, which tend to produce cluster hierarchies with a very unnatural shape, the approach tries to produce a more natural and comprehensive hierarchy. Extensive experiments were conducted on different domains of text segments. The obtained results have shown the potential of the proposed approach, which is believed able to benefit many information systems.