Text classification using sentential frequent itemsets

Authors:
Shi-Zhu Liu;He-Ping Hu
Affiliations:
College of Computer Science, Huazhong University of Science and Technology, Wuhan, China;College of Computer Science, Huazhong University of Science and Technology, Wuhan, China
Venue:
Journal of Computer Science and Technology
Year:
2007

Citing 17
Cited 0

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Variable precision rough set model

Journal of Computer and System Sciences
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
Improving text retrieval for the routing problem using latent semantic indexing

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

ACM Transactions on Information Systems (TOIS)
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical neural networks for text categorization (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text classification using ESC-based stochastic decision lists

Proceedings of the eighth international conference on Information and knowledge management
Scalable association-based text classification

Proceedings of the ninth international conference on Information and knowledge management
A vector space model for automatic indexing

Communications of the ACM
Classifying text documents by associating terms with text categories

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
The use of bigrams to enhance text categorization

Information Processing and Management: an International Journal
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Applying an existing machine learning algorithm to text categorization

Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing
Text Document Categorization by Term Association

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text classification techniques mostly rely on single term analysis of the document data set, while more concepts, especially the specific ones, are usually conveyed by set of terms. To achieve more accurate text classifier, more informative feature including frequent co-occurring words in the same sentence and their weights are particularly important in such scenarios. In this paper, we propose a novel approach using sentential frequent itemset, a concept comes from association rule mining, for text classification, which views a sentence rather than a document as a transaction, and uses a variable precision rough set based method to evaluate each sentential frequent itemset's contribution to the classification. Experiments over the Reuters and newsgroup corpus are carried out, which validate the practicability of the proposed system.