Complete inverted files for efficient text retrieval and analysis
Journal of the ACM (JACM)
Pattern matching algorithms
Annotated Statistical Indices for Sequence Analysis
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Global detectors of unusual words: design, implementation, and applications to pattern discovery in biosequences
Finding surprising patterns in a time series database in linear time and space
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A symbolic representation of time series, with implications for streaming algorithms
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Probabilistic discovery of time series motifs
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Verbumculus and the discovery of unusual words
Journal of Computer Science and Technology - Special issue on bioinformatics
Visually mining and monitoring massive time series
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Visualizing and discovering non-trivial patterns in large time series databases
Information Visualization
Motif discovery by monotone scores
Discrete Applied Mathematics
Experiencing SAX: a novel symbolic representation of time series
Data Mining and Knowledge Discovery
Maximal and minimal representations of gapped and non-gapped motifs of a string
Theoretical Computer Science
Efficient selection of unique and popular oligos for large EST databases
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Approximate variable-length time series motif discovery using grammar inference
Proceedings of the Tenth International Workshop on Multimedia Data Mining
Palmprint authentication using time series
AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
A clustering algorithm based on distinguishability for nominal attributes
ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Faster variance computation for patterns with gaps
MedAlg'12 Proceedings of the First Mediterranean conference on Design and Analysis of Algorithms
Hi-index | 0.00 |
The problem of characterizing and detecting recurrent sequence patterns such as substrings or motifs and related associations or rules is variously pursued in order to compress data, unveil structure, infer succinct descriptions, extract and classify features, etc. In Molecular Biology, exceptionally frequent or rare words in bio-sequences have been implicated in various facets of biological function and structure. The discovery, particularly on a massive scale, of such patterns poses interesting methodological and algorithmic problems, and often exposes scenarios in which tables and synopses grow faster and bigger than the raw sequences they are meant to encapsulate. In previous study, the ability to succinctly compute, store, and display unusual substrings has been linked to a subtle interplay between the combinatorics of the subwords of a word and local monotonicities of some scores used to measure the departure from expectation. In this paper, we carry out an extensive analysis of such monotonicities for a broader variety of scores. This supports the construction of data structures and algorithms capable of performing global detection of unusual substrings in time and space linear in the subject sequences, under various probabilistic models.