Effective temporal data classification by integrating sequential pattern mining and probabilistic induction

Authors:
Vincent S. Tseng;Chao-Hui Lee
Affiliations:
Department of Computer Science and Information Engineering, National Chen-Kung University, No. 1, University Road, Tainan City 701, Taiwan, ROC;Department of Computer Science and Information Engineering, National Chen-Kung University, No. 1, University Road, Tainan City 701, Taiwan, ROC
Venue:
Expert Systems with Applications: An International Journal
Year:
2009

Citing 20
Cited 3

C4.5: programs for machine learning

C4.5: programs for machine learning
Efficient enumeration of frequent sequences

Proceedings of the seventh international conference on Information and knowledge management
Feature generation for sequence categorization

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Mining features for sequence classification

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining long sequential patterns in a noisy environment

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Information Processing and Technology

Information Processing and Technology
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
The PSP Approach for Mining Sequential Patterns

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Finding surprising patterns in a time series database in linear time and space

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
CMP: A Fast Decision Tree Classifier Using Multivariate Predictions

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Probabilistic discovery of time series motifs

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining compressed frequent-pattern sets

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Data Mining and Knowledge Discovery Handbook

Data Mining and Knowledge Discovery Handbook
Fast time series classification using numerosity reduction

ICML '06 Proceedings of the 23rd international conference on Machine learning
Reducing SVM classification time using multiple mirror classifiers

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Multiscale Classification Using Nearest Neighbor Density Estimates

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A Novel Similarity-Based Fuzzy Clustering Algorithm by Integrating PCM and Mountain Method

IEEE Transactions on Fuzzy Systems

Temporal Data Classification Using Linear Classifiers

ADBIS '09 Proceedings of the 13th East European Conference on Advances in Databases and Information Systems
Temporal data classification using linear classifiers

Information Systems
Robust approach for estimating probabilities in Naïve-Bayes Classifier for gene expression data

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

Data classification is an important topic in the field of data mining due to its wide applications. A number of related methods have been proposed based on the well-known learning models such as decision tree or neural network. Although data classification was widely discussed, relatively few studies explored the topic of temporal data classification. Most of the existing researches focused on improving the accuracy of classification by using statistical models, neural network, or distance-based methods. However, they cannot interpret the results of classification to users. In many research cases, such as gene expression of microarray, users prefer the classification information above a classifier only with a high accuracy. In this paper, we propose a novel pattern-based data mining method, namely classify-by-sequence (CBS), for classifying large temporal datasets. The main methodology behind the CBS is integrating sequential pattern mining with probabilistic induction. The CBS has the merit of simplicity in implementation and its pattern-based architecture can supply clear classification information to users. Through experimental evaluation, the CBS was shown to deliver classification results with high accuracy under two real time series datasets. In addition, we designed a simulator to evaluate the performance of CBS under datasets with different characteristics. The experimental results show that CBS can discover the hidden patterns and classify data effectively by utilizing the mined sequential patterns.