Adapting associative classification to text categorization

Authors:
Baoli Li;Neha Sugandh;Ernest V. Garcia;Ashwin Ram
Affiliations:
Georgia Institute of Technology;Georgia Institute of Technology;Emory University;Georgia Institute of Technology
Venue:
Proceedings of the 2007 ACM symposium on Document engineering
Year:
2007

Citing 2
Cited 4

CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Text Document Categorization by Term Association

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining

No mining, no meaning: relating documents across repositories with ontology-driven information extraction

Proceedings of the eighth ACM symposium on Document engineering
Symptom-based problem determination using log data abstraction

Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Bridging the gap between software architecture rationale formalisms and actual architecture documents: An ontology-driven approach

Science of Computer Programming
Two scalable algorithms for associative text classification

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Associative classification, which originates from numerical data mining, has been applied to deal with text data recently. Text data is firstly digitalized to database of transactions, and then training and prediction is actually conducted on the derived numerical dataset. This intuitive strategy has demonstrated quite good performance. However, it doesn't take into consideration the inherent characteristics of text data as much as possible, although it has to deal with some specific problems of text data such as lemmatizing and stemming during digitalization. In this paper, we propose a bottom-up strategy to adapt associative classification to text categorization, in which we take into account structure information of text. Experiments on Reuters-21578 dataset show that the proposed strategy can make use of text structure information and achieve better performance.