Sequential patterns for text categorization

Authors:
S. Jaillet;A. Laurent;M. Teisseire
Affiliations:
LIRMM-CNRS - Université Montpellier, Montpellier Cedex, France;LIRMM-CNRS - Université Montpellier, Montpellier Cedex, France;LIRMM-CNRS - Université Montpellier, Montpellier Cedex, France
Venue:
Intelligent Data Analysis
Year:
2006

Citing 23
Cited 6

C4.5: programs for machine learning

C4.5: programs for machine learning
Cluster-based text categorization: a comparison of category search strategies

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Extending naïve Bayes classifiers using long itemsets

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic Indexing: An Experimental Inquiry

Journal of the ACM (JACM)
Growing decision trees on support-less association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
The CN2 Induction Algorithm

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Document Categorization by Term Association

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Visualizing Sequential Patterns for Text Mining

INFOVIS '00 Proceedings of the IEEE Symposium on Information Vizualization 2000
Incremental mining of sequential patterns in large databases

Data & Knowledge Engineering
On support thresholds in associative classification

Proceedings of the 2004 ACM symposium on Applied computing
Pre-Processing Time Constraints for Efficiently Mining Generalized Sequential Patterns

TIME '04 Proceedings of the 11th International Symposium on Temporal Representation and Reasoning
Word selection for EBMT based on monolingual similarity and translation confidence

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3

A two-stage methodology for sequence classification based on sequential pattern mining and optimization

Data & Knowledge Engineering
Sequential Patterns for Maintaining Ontologies over Time

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
Extraction of unexpected sentences: A sentiment classification assessed approach

Intelligent Data Analysis
A pattern discovery model for effective text mining

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Discovering relevant features for effective query formulation

IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Free-gram phrase identification for modeling Chinese text

Information Processing Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text categorization is a well-known task based essentially on statistical approaches using neural networks, Support Vector Machines and other machine learning algorithms. Texts are generally considered as bags of words without any order. Although these approaches have proven to be efficient, they do not provide users with comprehensive and reusable rules about their data. Such rules are, however, very important for users to describe trends in the data they have to analyze. In this framework, an association-rule based approach has been proposed by Bing Liu (CBA). We propose, in this paper, to extend this approach by using sequential patterns in the SPaC method (Sequential Patterns for Classification) for text categorization. Taking order into account allows us to represent the succession of words through a document without complex and time-consuming representations and treatments such as those performed in natural language and grammatical methods. The original method we propose here consists in mining sequential patterns in order to build a classifier. We experimentally show that our proposal is relevant, and that it is very interesting compared to other methods. In particular, our method outperforms CBA and provides better results than SVM on some corpus.