Exhaustive peptide searching using relations

Authors:
Ela Hunt
Affiliations:
Department of Computer Science, ETH Zurich, Zurich, Switzerland
Venue:
BNCOD'07 Proceedings of the 24th British national conference on Databases
Year:
2007

Citing 10
Cited 0

Programming perl

Programming perl
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
q-gram based database searching using a suffix array (QUASAR)

RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Database indexing for large DNA and protein sequence collections

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient Index Structures for String Databases

Proceedings of the 27th International Conference on Very Large Data Bases
A Database Index to Large Biological Sequences

Proceedings of the 27th International Conference on Very Large Data Bases
Approximate String-Matching over Suffix Trees

CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
High Speed Homology Search Using Run-Time Reconfiguration

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Gene Matching Using JBits

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
OASIS: an online and accurate technique for local-alignment searches on biological sequences

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new robust solution to short peptide searching, tested on a relational platform, with a set of biological queries. Our algorithm is appropriate for large scale scientific data analysis, and has been tested with 1.4GB of amino-acids. Protein sequences are indexed as short overlapping string windows, and stored in a relation. To find approximate matches, we use a neighbourhood generation algorithm. The words in the neighbourhood are then fetched and stored in a relation. We measure execution time and compare the matches found to those delivered by BLAST. We report some performance gains in exact matching and searching within edit distance 1, and very significant quality improvements over heuristics, as we guarantee to deliver all relevant matches.