Categorizing unknown text segments for information extraction using a search result mining approach

  • Authors:
  • Chien-Chung Huang;Shui-Lung Chuang;Lee-Feng Chien

  • Affiliations:
  • Institute of Information Science, Academia Sinica, Taiwan;Institute of Information Science, Academia Sinica, Taiwan;Institute of Information Science, Academia Sinica, Taiwan

  • Venue:
  • IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

An advanced information extraction system requires an effective text categorization technique to categorize extracted facts (text segments) into a hierarchy of domain-specific topic categories. Text segments are often short and their categorization is quite different from conventional document categorization. This paper proposes a Web mining approach that exploits Web resources to categorize unknown text segments with limited manual intervention. The feasibility and wide adaptability of the proposed approach has been shown with extensive experiments on categorizing different kinds of text segments including domain-specific terms, named entities, and even paper titles into Yahoo!’s taxonomy trees.