An algorithm for string matching with a sequence of don't cares
Information Processing Letters
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Matching a set of strings with variable length don't cares
Theoretical Computer Science
Window-accumulated subsequence matching problem is linear
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Machine Learning
Methods and Problems in Data Mining
ICDT '97 Proceedings of the 6th International Conference on Database Theory
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Composite Event Specification in Active Databases: Model & Implementation
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
TelegraphCQ: continuous dataflow processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A symbolic representation of time series, with implications for streaming algorithms
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Subsequence matching on structured time series data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Declarative Querying for Biological Sequences
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
High-performance complex event processing over streams
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Monitoring streams: a new class of data management applications
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Event queries on correlated probabilistic streams
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
VTrack: accurate, energy-aware road traffic delay estimation using mobile phones
Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems
DUST: a generalized notion of similarity between uncertain time series
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
DUST: a generalized notion of similarity between uncertain time series
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Probability: Theory and Examples
Probability: Theory and Examples
Similarity Join Processing on Uncertain Data Streams
IEEE Transactions on Knowledge and Data Engineering
ε-Matching: event processing over noisy sequences in real time
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
RCSI: scalable similarity search in thousand(s) of genomes
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Windowed subsequence matching over deterministic strings has been studied in previous work in the contexts of knowledge discovery, data mining, and molecular biology. However, we observe that in these applications, as well as in data stream monitoring, complex event processing, and time series data processing in which streams can be mapped to strings, the strings are often noisy and probabilistic. We study this problem in the online setting where efficiency is paramount. We first formulate the query semantics, and propose an exact algorithm. Then we propose a randomized approximation algorithm that is faster and, in the mean time, provably accurate. Moreover, we devise a filtering algorithm to further enhance the efficiency with an optimization technique that is adaptive to sequence stream contents. Finally, we propose algorithms for patterns with negations. In order to verify the algorithms, we conduct a systematic empirical study using three real datasets and some synthetic datasets.