Variable-length intervals in homology search

Authors:
Abhijit Chattaraj;Hugh E. Williams
Affiliations:
RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia
Venue:
APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
Year:
2004

Citing 6
Cited 1

Automatic text processing

Automatic text processing
A critical investigation of recall and precision as measures of retrieval system performance

ACM Transactions on Information Systems (TOIS)
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
The normalized recall and related measures

SIGIR '83 Proceedings of the 6th annual international ACM SIGIR conference on Research and development in information retrieval
Indexing and Retrieval for Genomic Databases

IEEE Transactions on Knowledge and Data Engineering
Genomic information retrieval

ADC '03 Proceedings of the 14th Australasian database conference - Volume 17

Survey on index based homology search algorithms

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fast, accurate, and scalable search techniques for homology searching of large genomic collections are becoming an increasingly important requirement as genomic sequence collections continue to double in size almost yearly. Almost all homology search techniques rely on extracting fixed-length overlapping sequences from queries and database sequences, and comparing these as the first step in query evaluation; this is a feature of well-known tools such as FASTA, BLAST, and our own CAFE technique. In this paper we discuss a novel, variable-length approach to extracting subsequences that is based on homology scoring matrices. Our motivation is to achieve a balance between the speed and accuracy of fixed-length choices, that is, to encapsulate the speed of longer subsequence lengths and the accuracy of shorter ones. We show that incorporating this approach into our CAFE technique leads to a good compromise between accuracy and retrieval efficiency when searching with BLOSUM matrices sensitive to distant evolutionary relationships. We expect the same results would be achieved with other homology search techniques.