Fully Automatic Text Categorization by Exploiting WordNet

Authors:
Jianqiang Li;Yu Zhao;Bo Liu
Affiliations:
NEC Laboratories China, Beijing, China 100084;NEC Laboratories China, Beijing, China 100084;NEC Laboratories China, Beijing, China 100084
Venue:
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Year:
2009

Citing 15
Cited 1

The nature of statistical learning theory

The nature of statistical learning theory
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Using WordNet and Lexical Operators to Improve Internet Searches

IEEE Internet Computing
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms

Journal of Intelligent Information Systems
Introduction to the special issue on word sense disambiguation: the state of the art

Computational Linguistics - Special issue on word sense disambiguation
Automatic text categorization by unsupervised learning

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Investigating unsupervised learning for text categorization bootstrapping

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A characterization of wordnet features in Boolean models for text classification

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Text classification by labeling words

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence

A semantic term weighting scheme for text categorization

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a Fully Automatic Categorization approach for Text (FACT) by exploiting the semantic features from WordNet and document clustering. In FACT, the training data is constructed automatically by using the knowledge of the category name. With the support of WordNet, it first uses the category name to generate a set of features for the corresponding category. Then, a set of documents is labeled according to such features. To reduce the possible bias originating from the category name and generated features, document clustering is used to refine the quality of initial labeling. The training data are subsequently constructed to train the discriminative classifier. The empirical experiments show that the best performance of FACT can achieve more than 90% of the baseline SVM classifiers in F1 measure, which demonstrates the effectiveness of the proposed approach.