Compressed Pattern Matching in DNA Sequences

Authors:
Lei Chen;Shiyong Lu;Jeffrey Ram
Affiliations:
Wayne State University;Wayne State University;Wayne State University
Venue:
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Year:
2004

Citing 12
Cited 6

Algorithms for pattern matching

Software—Practice & Experience
Fast text searching: allowing errors

Communications of the ACM
A new algorithm for data compression

The C Users Journal
Let sleeping files lie: pattern matching in Z-compressed files

Journal of Computer and System Sciences
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
A fast string searching algorithm

Communications of the ACM
Boyer-Moore String Matching over Ziv-Lempel Compressed Text

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
A Boyer-Moore Type Algorithm for Compressed Pattern Matching

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
A Text Compression Scheme That Allows Fast Searching Directly in the Compressed File

CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text

CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
Speeding Up Pattern Matching by Text Compression

CIAC '00 Proceedings of the 4th Italian Conference on Algorithms and Complexity
Multiple Pattern Matching in LZW Compressed Text

DCC '98 Proceedings of the Conference on Data Compression

Fast search in DNA sequence databases using punctuation and indexing

ACST'06 Proceedings of the 2nd IASTED international conference on Advances in computer science and technology
An adaptable FPGA-based system for regular expression matching

Proceedings of the conference on Design, automation and test in Europe
Novel methods of faster cardiovascular diagnosis in wireless telecardiology

IEEE Journal on Selected Areas in Communications - Special issue on wireless and pervasive communications for healthcare
PeRex: A Power Efficient FPGA-based Architecture for Regular Expression Matching

GREENCOM '11 Proceedings of the 2011 IEEE/ACM International Conference on Green Computing and Communications
Fast matching method for DNA sequences

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
FRESCO: Referential Compression of Highly Similar Sequences

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.02

Visualization

Abstract

We propose derivative Boyer-Moore (d-BM), a new compressed pattern matching algorithm in DNA sequences. This algorithm is based on the Boyer-Moore method, which is one of the most popular string matching algorithms. In this approach, we compress both DNA sequences and patterns by using two bits to represent each A, T, C, G character. Experiments indicate that this compressed pattern matching algorithm searches long DNA patterns (length 50) more than 10 times faster than the exact match routine of the software package Agrep, which is known as the fastest pattern matching tool. Moreover, compression of DNA sequences by this method gives a guaranteed space saving of 75%. In part the enhanced speed of the algorithm is due to the increased efficiency of the Boyer-Moore method resulting from an increase in alphabet size from 4 to 256.