A New Text Categorization Technique Using Distributional Clustering and Learning Logic

Authors:
Hisham Al-Mubaid;Syed A. Umair
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2006

Citing 20
Cited 11

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
The nature of statistical learning theory

The nature of statistical learning theory
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Learning to classify text from labeled and unlabeled documents

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical neural networks for text categorization (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A statistical learning learning model of text classification for support vector machines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Machine Learning

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Method for Controlling Errors in Two-Class Classification

COMPSAC '99 23rd International Computer Software and Applications Conference
A MINSAT Approach for Learning in Logic Domains

INFORMS Journal on Computing
Distributional word clusters vs. words for text categorization

The Journal of Machine Learning Research
A divisive information theoretic feature clustering algorithm for text classification

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research

The Chinese text categorization system with association rule and category priority

Expert Systems with Applications: An International Journal
Text Categorization Using Fuzzy Proximal SVM and Distributional Clustering of Words

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Efficient Text Classification Using Best Feature Selection and Combination of Methods

Proceedings of the Symposium on Human Interface 2009 on ConferenceUniversal Access in Human-Computer Interaction. Part I: Held as Part of HCI International 2009
A graph-theoretic framework for semantic distance

Computational Linguistics
Text categorization using distributional clustering and concept extraction

ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
Document classification algorithm based on IB and LS-SVM

IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
A comparison study on multiple binary-class SVM methods for unilabel text categorization

Pattern Recognition Letters
Automatically computed document dependent weighting factor facility for Naïve Bayes classification

Expert Systems with Applications: An International Journal
Measuring the interestingness of articles in a limited user environment

Information Processing and Management: an International Journal
High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic

Expert Systems with Applications: An International Journal
Automatic folder allocation system using Bayesian-support vector machines hybrid classification approach

Applied Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

Text categorization is continuing to be one of the most researched NLP problems due to the ever-increasing amounts of electronic documents and digital libraries. In this paper, we present a new text categorization method that combines the distributional clustering of words and a learning logic technique, called Lsquare, for constructing text classifiers. The high dimensionality of text in a document has not been fruitful for the task of categorization, for which reason, feature clustering has been proven to be an ideal alternative to feature selection for reducing the dimensionality. We, therefore, use distributional clustering method (IB) to generate an efficient representation of documents and apply Lsquare for training text classifiers. The method was extensively tested and evaluated. The proposed method achieves higher or comparable classification accuracy and {\rm F}_1 results compared with SVM on exact experimental settings with a small number of training documents on three benchmark data sets WebKB, 20Newsgroup, and Reuters-21578. The results prove that the method is a good choice for applications with a limited amount of labeled training data. We also demonstrate the effect of changing training size on the classification performance of the learners.