Feature generation for sequence categorization

Authors:
Daniel Kudenko;Haym Hirsh
Affiliations:
-;-
Venue:
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Year:
1998

Citing 6
Cited 11

C4.5: programs for machine learning

C4.5: programs for machine learning
Technical Note: Selecting a Classification Method by Cross-Validation

Machine Learning
Learning to recognize promoter sequences in E. coli by modeling uncertainty in the training data

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Representation change in machine learning

AI Communications
Representing sequences in description logics

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Transferring and retraining learned information filters

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Mining features for sequence classification

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable Feature Mining for Sequential Data

IEEE Intelligent Systems
Evaluation of Techniques for Classifying Biological Sequences

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
XRules: An effective algorithm for structural classification of XML data

Machine Learning
Learning recurrent behaviors from heterogeneous multivariate time-series

Artificial Intelligence in Medicine
Effective temporal data classification by integrating sequential pattern mining and probabilistic induction

Expert Systems with Applications: An International Journal
Wikipedia-based semantic interpretation for natural language processing

Journal of Artificial Intelligence Research
Feature generation for text categorization using world knowledge

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
A brief survey on sequence classification

ACM SIGKDD Explorations Newsletter
Protein sequence classification through relevant sequence mining and bayes classifiers

EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
Feature discovery in classification problems

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of sequence categorization is to generalize from a corpus of labeled sequences procedures for accurately labeling future unlabeled sequences. The choice of representation of sequences can have a major impact on this task, and in the absence of background knowledge a good representation is often not known and straightforward representations are often far from optimal. We propose a feature generation method (called FGEN) that creates Boolean features that check for the presence or absence of heuristically selected collections of subsequences. We show empirically that the representation computed by FGEN improves the accuracy of two commonly used learning systems (C4.5 and Ripper) when the new features are added to existing representations of sequence data. We show the superiority of FGEN across a range of tasks selected from three domains: DNA sequences, Unix command sequences, and English text.