Enhancing Techniques for Efficient Topic Hierarchy Integration

  • Authors:
  • Jyh-Jong Tsay;Hsuan-Yu Chen;Chi-Feng Chang;Ching-Han Lin

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we study the problem of integrating documentsfrom different sources into a comprehensive topic hierarchy.Our objective is to develop efficient techniques thatimprove the accuracy of traditional categorization methodsby incorporating categorization information providedby data sources into categorization process. Notice thatin the World-Wide Web, categorization information is oftenavailable from information sources. We present severalenhancing techniques that use categorization informationto enhance traditional methods such as naive Bayes andsupport vector machines. Experiment on collections fromOpenfind and Yam, and Google and Yahoo!, well-knownpopular web sites in Taiwan and USA, respectively, showsthat our techniques significantly improve the classificationaccuracy from, for example, 55% to 66% for Naive Bayes,and from 57% to 67% for SVM for the data set collectedfrom Yam and Openfind.