Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Text document clustering based on frequent word sequences
Proceedings of the 14th ACM international conference on Information and knowledge management
TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases
IEEE Transactions on Knowledge and Data Engineering
Local Correlation Tracking in Time Series
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Correlation analysis of spatial time series datasets: a filter-and-refine approach
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Hi-index | 0.00 |
Sequence, widely appearing in various applications (e.g. event logs, text documents, etc) is an ordered list of objects. Exploring correlated objects in a sequence can provide useful knowledge among the objects, e.g., event causality in event log and word phrases in documents. In this paper, we introduce correlation querythat finds correlated pairs of objects often appearing closely to each other in a given sequence. A correlation query is specified by two control parameters, distance bound, the requirement of object closeness, and correlation threshold, the minimum requirement of correlation strength of result pairs. Instead of processing the query by scanning the sequence multiple times, that is called Multi-Scan Algorithm (MSA), we propose One-Scan Algorithm (OSA)and Index-Based Algorithm (IBA). OSA accesses a queried sequence once and IBA considers correlation threshold in the execution and effectively eliminates unneeded candidates from detail examination. An extensive set of experiments is conducted to evaluate all these algorithms. Among them, IBA, significantly outperforming the others, is the most efficient.