Brief communication: An efficient similarity search based on indexing in large DNA databases

Authors:
In-Seon Jeong;Kyoung-Wook Park;Seung-Ho Kang;Hyeong-Seok Lim
Affiliations:
School of Electronics & Computer Eng., Chonnam National University, 300 YongBong-Dong, Buk-Gu, Gwangju 500-757, Republic of Korea;School of Electronics & Computer Eng., Chonnam National University, 300 YongBong-Dong, Buk-Gu, Gwangju 500-757, Republic of Korea;School of Electronics & Computer Eng., Chonnam National University, 300 YongBong-Dong, Buk-Gu, Gwangju 500-757, Republic of Korea;School of Electronics & Computer Eng., Chonnam National University, 300 YongBong-Dong, Buk-Gu, Gwangju 500-757, Republic of Korea
Venue:
Computational Biology and Chemistry
Year:
2010

Citing 9
Cited 1

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Indexing and Retrieval for Genomic Databases

IEEE Transactions on Knowledge and Data Engineering
Efficient Index Structures for String Databases

Proceedings of the 27th International Conference on Very Large Data Bases
Effective Indexing and Filtering for Similarity Search in Large Biosequence Databases

BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
Piers: an efficient model for similarity search in DNA sequence databases

ACM SIGMOD Record
Survey on index based homology search algorithms

The Journal of Supercomputing
An Efficient Two-Phase Algorithm to Find Gene-Specific Probes for Large Genomes

FBIT '07 Proceedings of the 2007 Frontiers in the Convergence of Bioscience and Information Technologies
A Fast Heuristic Algorithm for Similarity Search in Large DNA Databases

FBIT '07 Proceedings of the 2007 Frontiers in the Convergence of Bioscience and Information Technologies
Brief Communication: A feature vector integration approach for a generalized support vector machine pairwise homology algorithm

Computational Biology and Chemistry

Indexing methods for approximate dictionary searching: Comparative analysis

Journal of Experimental Algorithmics (JEA)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms.