Establishing relationships among patterns in stock market data
Data & Knowledge Engineering
Clustering sequences by overlap
International Journal of Data Mining and Bioinformatics
Discovering multivariate motifs using subsequence density estimation and greedy mixture learning
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Improving activity discovery with automatic neighborhood estimation
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
VOGUE: A variable order hidden Markov model with duration based on frequent sequence mining
ACM Transactions on Knowledge Discovery from Data (TKDD)
Prism: An effective approach for frequent sequence mining via prime-block encoding
Journal of Computer and System Sciences
IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Privacy-preserving discovery of frequent patterns in time series
ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Generalised Sequence Signatures through symbolic clustering
International Journal of Data Mining and Bioinformatics
A frequent pattern mining method for finding planted (l, d)-motifs of unknown length
RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology
Graphical approach to weak motif recognition in noisy data sets
PRIB'06 Proceedings of the 2006 international conference on Pattern Recognition in Bioinformatics
CPMD: a matlab toolbox for change point and constrained motif discovery
IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
G-SteX: greedy stem extension for free-length constrained motif discovery
IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Maximal clique enumeration for large graphs on hadoop framework
Proceedings of the first workshop on Parallel programming for analytics applications
Hi-index | 3.84 |
Motivation: Motif discovery in sequential data is a problem of great interest and with many applications. However, previous methods have been unable to combine exhaustive search with complex motif representations and are each typically only applicable to a certain class of problems. Results: Here we present a generic motif discovery algorithm (Gemoda) for sequential data. Gemoda can be applied to any dataset with a sequential character, including both categorical and real-valued data. As we show, Gemoda deterministically discovers motifs that are maximal in composition and length. As well, the algorithm allows any choice of similarity metric for finding motifs. Finally, Gemoda's output motifs are representation-agnostic: they can be represented using regular expressions, position weight matrices or any number of other models for any type of sequential data. We demonstrate a number of applications of the algorithm, including the discovery of motifs in amino acids sequences, a new solution to the (l,d)-motif problem in DNA sequences and the discovery of conserved protein substructures. Availability: Gemoda is freely available at http://web.mit.edu/bamel/gemoda Contact: gregstep@mit.edu Supplementary Information: Available at http://web.mit.edu/bamel/gemoda