Enhancing Techniques for Efficient Topic Hierarchy Integration

Authors:
Jyh-Jong Tsay;Hsuan-Yu Chen;Chi-Feng Chang;Ching-Han Lin
Affiliations:
-;-;-;-
Venue:
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Year:
2003

Citing 12
Cited 2

Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Data mining with decision trees and decision rules

Future Generation Computer Systems - Special double issue on data mining
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
On integrating catalogs

Proceedings of the 10th international conference on World Wide Web
Machine Learning

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Building Hierarchical Classifiers Using Class Proximity

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies

The VLDB Journal — The International Journal on Very Large Data Bases

An iterative approach for web catalog integration with support vector machines

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Learning to integrate web catalogs with conceptual relationships in hierarchical thesaurus

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study the problem of integrating documentsfrom different sources into a comprehensive topic hierarchy.Our objective is to develop efficient techniques thatimprove the accuracy of traditional categorization methodsby incorporating categorization information providedby data sources into categorization process. Notice thatin the World-Wide Web, categorization information is oftenavailable from information sources. We present severalenhancing techniques that use categorization informationto enhance traditional methods such as naive Bayes andsupport vector machines. Experiment on collections fromOpenfind and Yam, and Google and Yahoo!, well-knownpopular web sites in Taiwan and USA, respectively, showsthat our techniques significantly improve the classificationaccuracy from, for example, 55% to 66% for Naive Bayes,and from 57% to 67% for SVM for the data set collectedfrom Yam and Openfind.