Trying to outperform a well-known index with a sequential scan

Authors:
Jan Hentschel;Thomas Meyer;Thomas Rommel
Affiliations:
Otto-von-Guericke-University, Magdeburg, Germany;Otto-von-Guericke-University, Magdeburg, Germany;Otto-von-Guericke-University, Magdeburg, Germany
Venue:
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Year:
2013

Citing 10
Cited 0

Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Provably sensitive Indexing strategies for biosequence similarity search

Proceedings of the sixth annual international conference on Computational biology
Database indexing for large DNA and protein sequence collections

The VLDB Journal — The International Journal on Very Large Data Bases
A Master-Slave Approach to Parallel Term Rewriting on a Hierarchical Multiprocessor

DISCO '96 Proceedings of the International Symposium on Design and Implementation of Symbolic Computation Systems
A Metric Index for Approximate String Matching

LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Prefix tree indexing for similarity search and similarity joins on genomic data

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Finding the Nearest Neighbors in Biological Databases Using Less Distance Computations

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Multicore Desktop Programming with Intel Threading Building Blocks

IEEE Software
Clarifying and compiling C/C++ concurrency: from C++11 to POWER

POPL '12 Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Efficient similarity search in very large string sets

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The string similarity search is an important research area. It enables applications to accept input errors and to detect similarities between strings. This kind of search contains the string similarity search problem. The time to solve this problem depends on the number, the length and the size of the alphabet of the data to search. It is possible to divide the data in data of natural language and data of non-natural language. In data of natural language, this paper analyzes a set of names of cities all over the world. For non-natural language data the paper uses reads from human genome. This paper wants to analyze, if it is possible to outperform an index-based search by a sequential search algorithm. The evaluation shows, that the index-based search has a higher performance on the human genome reads, but not on the geographical names.