Generalized regression model for sequence matching and clustering

Authors:
Hansheng Lei;Venu Govindaraju
Affiliations:
University of Texas at Brownsville, Department of Computer Science and Computer Information Systems, 78520, Brownsville, TX, USA;The State University of New York at Buffalo, Govindaraju Computer Science and Engineering Department, 78520, Amherst, NY, USA
Venue:
Knowledge and Information Systems
Year:
2007

Citing 19
Cited 0

Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Similarity-based queries

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Similarity-based queries for time series data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Time-series similarity problems and well-separated geometric sets

SCG '97 Proceedings of the thirteenth annual symposium on Computational geometry
Matching and indexing sequences of different lengths

CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
Fast time-series searching with scaling and shifting

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Efficient Retrieval of Similar Time Sequences Under Time Warping

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
HierarchyScan: A Hierarchical Similarity Search Algorithm for Databases of Long Sequences

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Finding Similar Time Series

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
The Haar Wavelet Transform in the Time Series Similarity Paradigm

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation

CP '95 Proceedings of the First International Conference on Principles and Practice of Constraint Programming
Efficient Time Series Matching by Wavelets

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Similarity Search for Multidimensional Data Sequences

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Exact indexing of dynamic time warping

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Linear relation has been found to be valuable in rule discovery of stocks, such as if stock X goes up a, stock Y will go down b. The traditional linear regression models the linear relation of two sequences faithfully. However, if a user requires clustering of stocks into groups where sequences have high linearity or similarity with each other, it is prohibitively expensive to compare sequences one by one. In this paper, we present generalized regression model (GRM) to match the linearity of multiple sequences at a time. GRM also gives strong heuristic support for graceful and efficient clustering. The experiments on the stocks in the NASDAQ market mined interesting clusters of stock trends efficiently.