A generic motif discovery algorithm for sequential data

Authors:
Kyle L. Jensen;Mark P. Styczynski;Isidore Rigoutsos;Gregory N. Stephanopoulos
Affiliations:
Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge, MA 02139, USA;Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge, MA 02139, USA;Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge, MA 02139, USA;Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge, MA 02139, USA
Venue:
Bioinformatics
Year:
2006

Citing 0
Cited 14

Establishing relationships among patterns in stock market data

Data & Knowledge Engineering
Clustering sequences by overlap

International Journal of Data Mining and Bioinformatics
Discovering multivariate motifs using subsequence density estimation and greedy mixture learning

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Improving activity discovery with automatic neighborhood estimation

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
VOGUE: A variable order hidden Markov model with duration based on frequent sequence mining

ACM Transactions on Knowledge Discovery from Data (TKDD)
Prism: An effective approach for frequent sequence mining via prime-block encoding

Journal of Computer and System Sciences
Unsupervised simultaneous learning of gestures, actions and their associations for human-robot interaction

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Privacy-preserving discovery of frequent patterns in time series

ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Generalised Sequence Signatures through symbolic clustering

International Journal of Data Mining and Bioinformatics
A frequent pattern mining method for finding planted (l, d)-motifs of unknown length

RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology
Graphical approach to weak motif recognition in noisy data sets

PRIB'06 Proceedings of the 2006 international conference on Pattern Recognition in Bioinformatics
CPMD: a matlab toolbox for change point and constrained motif discovery

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
G-SteX: greedy stem extension for free-length constrained motif discovery

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Maximal clique enumeration for large graphs on hadoop framework

Proceedings of the first workshop on Parallel programming for analytics applications

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Motif discovery in sequential data is a problem of great interest and with many applications. However, previous methods have been unable to combine exhaustive search with complex motif representations and are each typically only applicable to a certain class of problems. Results: Here we present a generic motif discovery algorithm (Gemoda) for sequential data. Gemoda can be applied to any dataset with a sequential character, including both categorical and real-valued data. As we show, Gemoda deterministically discovers motifs that are maximal in composition and length. As well, the algorithm allows any choice of similarity metric for finding motifs. Finally, Gemoda's output motifs are representation-agnostic: they can be represented using regular expressions, position weight matrices or any number of other models for any type of sequential data. We demonstrate a number of applications of the algorithm, including the discovery of motifs in amino acids sequences, a new solution to the (l,d)-motif problem in DNA sequences and the discovery of conserved protein substructures. Availability: Gemoda is freely available at http://web.mit.edu/bamel/gemoda Contact: gregstep@mit.edu Supplementary Information: Available at http://web.mit.edu/bamel/gemoda