M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Probabilistic Spell for the Curse of Dimensionality
ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Unified framework for fast exact and approximate search in dissimilarity spaces
ACM Transactions on Database Systems (TODS)
Analyzing Metric Space Indexes: What For?
SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Parallel Dynamic Batch Loading in the M-tree
SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
An application of the metric access methods to the mass spectrometry data
CIBCB'09 Proceedings of the 6th Annual IEEE conference on Computational Intelligence in Bioinformatics and Computational Biology
Element detection relying on information retrieval techniques applied to laser spectroscopy
Proceedings of the Fourth International Conference on SImilarity Search and APplications
Protein sequences identification using NM-tree
Proceedings of the Fourth International Conference on SImilarity Search and APplications
Hi-index | 0.00 |
In biological applications, the tandem mass spectrometry is a widely used method for determining protein and peptide sequences from an "in vitro" sample. The sequences are not determined directly, but they must be interpreted from the mass spectra, which is the output of the mass spectrometer. This work is focused on a similarity-search approach to mass spectra interpretation, where the parametrized Hausdorff distance (dHP) is used as the similarity. In order to provide an efficient similarity search under dHP, the metric access methods and the TriGen algorithm (controlling the metricity of dHP) are employed. We show that similarity search using dHP exhibits better correctness of peptide mass spectra interpretation than the cosine similarity commonly mentioned in mass spectrometry literature. Moreover, the search model using the dHP distance could be extended to support chemical modifications in the query mass spectra, which is typically a problem when the cosine similarity is used. Our approach can be utilized as a coarse filter by any other database approach for mass spectra interpretation.