Sequential Pattern Mining in Multi-Databases via Multiple Alignment

Authors:
Hye-Chung Kum;Joong Hyuk Chang;Wei Wang
Affiliations:
Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, U.S.A.;Department of Computer Science, Yonsei University, Seoul, Korea 120-749;Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, U.S.A.
Venue:
Data Mining and Knowledge Discovery
Year:
2006

Citing 17
Cited 10

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Efficient enumeration of frequent sequences

Proceedings of the seventh international conference on Information and knowledge management
Data clustering: a review

ACM Computing Surveys (CSUR)
KDD-Cup 2000 organizers' report: peeling the onion

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Mining long sequential patterns in a noisy environment

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

Data Mining and Knowledge Discovery
Synthesizing High-Frequency Rules from Different Data Sources

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Peculiarity Oriented Multi-database Mining

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Managing Interesting Rules in Sequence Mining

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An efficient strategy for mining exceptions in multi-databases

Information Sciences: an International Journal
Database classification for multi-database mining

Information Systems
Approximate mining of consensus sequential patterns

Approximate mining of consensus sequential patterns
Benchmarking the effectiveness of sequential pattern mining methods

Data & Knowledge Engineering

Mining Multiple Data Sources: Local Pattern Analysis

Data Mining and Knowledge Discovery
Enhancing quality of knowledge synthesized from multi-database mining

Pattern Recognition Letters
A change detection method for sequential patterns

Decision Support Systems
Sequential pattern mining algorithm for automotive warranty data

Computers and Industrial Engineering
Mining sequential patterns across multiple sequence databases

Data & Knowledge Engineering
Intelligent sequential mining via alignment: optimization techniques for very large DB

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Mining weighted sequential patterns in a sequence database with a time-interval weight

Knowledge-Based Systems
Length of stay prediction for clinical treatment process using temporal similarity

Expert Systems with Applications: An International Journal
Reprint of "Length of stay prediction for clinical treatment process using temporal similarity"

Expert Systems with Applications: An International Journal
Efficient frequent pattern mining based on Linear Prefix tree

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

To efficiently find global patterns from a multi-database, information in each local database must first be mined and summarized at the local level. Then only the summarized information is forwarded to the global mining process. However, conventional sequential pattern mining methods based on support cannot summarize the local information and is ineffective for global pattern mining from multiple data sources. In this paper, we present an alternative local mining approach for finding sequential patterns in the local databases of a multi-database. We propose the theme of approximate sequential pattern mining roughly defined as identifying patterns approximately shared by many sequences. Approximate sequential patterns can effectively summerize and represent the local databases by identifying the underlying trends in the data. We present a novel algorithm, ApproxMAP, to mine approximate sequential patterns, called consensus patterns, from large sequence databases in two steps. First, sequences are clustered by similarity. Then, consensus patterns are mined directly from each cluster through multiple alignment. We conduct an extensive and systematic performance study over synthetic and real data. The results demonstrate that ApproxMAP is effective and scalable in mining large sequences databases with long patterns. Hence, ApproxMAP can efficiently summarize a local database and reduce the cost for global mining. Furthremore, we present an elegant and uniform model to identify both high vote sequential patterns and exceptional sequential patterns from the collection of these consensus patterns from each local databases.