Algorithms for approximate string matching
Information and Control
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Distance-based indexing for high-dimensional metric spaces
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
Data structures and algorithms for nearest neighbor search in general metric spaces
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
How to improve the pruning ability of dynamic metric access methods
Proceedings of the eleventh international conference on Information and knowledge management
Similarity Search without Tears: The OMNI Family of All-purpose Access Methods
Proceedings of the 17th International Conference on Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Efficient Index Structures for String Databases
Proceedings of the 27th International Conference on Very Large Data Bases
A Database Index to Large Biological Sequences
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Fast and Practical Approximate String Matching
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search
ACM Transactions on Database Systems (TODS)
DSIM: A Distance-Based Indexing Method for Genomic Sequences
BIBE '05 Proceedings of the Fifth IEEE Symposium on Bioinformatics and Bioengineering
OASIS: an online and accurate technique for local-alignment searches on biological sequences
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Approximate embedding-based subsequence matching of time series
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Reference-based indexing for metric spaces with costly distance measures
The VLDB Journal — The International Journal on Very Large Data Bases
Optimal incremental multi-step nearest-neighbor search
Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
Fast shortest path distance estimation in large networks
Proceedings of the 18th ACM conference on Information and knowledge management
Maximal metric margin partitioning for similarity search indexes
Proceedings of the 18th ACM conference on Information and knowledge management
Reference-based alignment in large sequence databases
Proceedings of the VLDB Endowment
Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space
Knowledge and Information Systems
Motion retrieval based on an efficient index method for large-scale mocap database
ICDHM'07 Proceedings of the 1st international conference on Digital human modeling
Effectiveness of optimal incremental multi-step nearest neighbor search
Expert Systems with Applications: An International Journal
Squeezing long sequence data for efficient similarity search
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Selecting vantage objects for similarity indexing
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Embedding-based subsequence matching in time-series databases
ACM Transactions on Database Systems (TODS)
A generic framework for efficient and effective subsequence retrieval
Proceedings of the VLDB Endowment
Finding representative objects using link analysis ranking
Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments
Hi-index | 0.00 |
We consider the problem of similarity search in a very large sequence database with edit distance as the similarity measure. Given limited main memory, our goal is to develop a reference-based index that reduces the number of costly edit distance computations in order to answer a query. The idea in reference-based indexing is to select a small set of reference sequences that serve as a surrogate for the other sequences in the database. We consider two novel strategies for selecting references as well as a new strategy for assigning references to database sequences. Our experimental results show that our selection and assignment methods far outperform competitive methods. For example, our methods prune up to 20 times as many sequences as the Omni method, and as many as 30 times as many sequences as frequency vectors. Our methods also scale nicely for databases containing many and/or very long sequences.