Complete inverted files for efficient text retrieval and analysis
Journal of the ACM (JACM)
Introduction to algorithms
Boyer-Moore approach to approximate string matching (extended abstract)
SWAT '90 Proceedings of the second Scandinavian workshop on Algorithm theory
A new approach to text searching
Communications of the ACM
Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Efficient implementation of suffix trees
Software—Practice & Experience
Genetic sequence data retrieval and manipulation based on generalized suffix trees
Genetic sequence data retrieval and manipulation based on generalized suffix trees
Fast text searching for regular expressions or automaton searching on tries
Journal of the ACM (JACM)
An orthogonally persistent Java
ACM SIGMOD Record
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Compact pat trees
q-gram based database searching using a suffix array (QUASAR)
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
Efficient suffix trees on secondary storage
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
On effective multi-dimensional indexing for strings
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Reducing the space requirement of suffix trees
Software—Practice & Experience
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Journal of Algorithms
An efficient object promotion algorithm for persistent object systems
Software—Practice & Experience
Elementary Computability, Formal Languages and Automata
Elementary Computability, Formal Languages and Automata
Fully Integrated Data Environments: Persistent Programming Languages, Object Stores, and Programmingenvironments
Accelerating Protein Classification Using Suffix Trees
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Efficient Index Structures for String Databases
Proceedings of the 27th International Conference on Very Large Data Bases
Providing Orthogonal Persistence for Java (Extended Abstract)
ECCOP '98 Proceedings of the 12th European Conference on Object-Oriented Programming
Factor Oracle: A New Structure for Pattern Matching
SOFSEM '99 Proceedings of the 26th Conference on Current Trends in Theory and Practice of Informatics on Theory and Practice of Informatics
Approximate String-Matching over Suffix Trees
CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
A New Indexing Method for Approximate String Matching
CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
Architecture of the PEVM: A High-Performance Orthogonally Persistent Java Virtual Machine
POS-9 Revised Papers from the 9th International Workshop on Persistent Object Systems
Overcoming the Memory Bottleneck in Suffix Tree Construction
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Proceedings of the Second International Workshop on Persistence and Java
Proceedings of the Second International Workshop on Persistence and Java
A Review of the Rationale and Architectures of PJama: a Durable, Flexible, Evolvable and Scalable Orthogonally Persistent Programming Platform
Orthogonal Persistence for the Java[tm] Platform: Specification and Rationale
Orthogonal Persistence for the Java[tm] Platform: Specification and Rationale
Constructing chromosome scale suffix trees
APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
PSIST: Indexing Protein Structures Using Suffix Trees
CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
An efficient approach for sequence matching in large DNA databases
Journal of Information Science
An efficient DNA sequence searching method using position specific weighting scheme
Journal of Information Science
A data structure for a sequence of string accesses in external memory
ACM Transactions on Algorithms (TALG)
Survey on index based homology search algorithms
The Journal of Supercomputing
PSIST: A scalable approach to indexing protein structures using suffix trees
Journal of Parallel and Distributed Computing
Towards Efficient Searching on the Secondary Structure of Protein Sequences
Fundamenta Informaticae - Special issue ISMIS'05
VisGenome with CartoonPlus: Supporting large scale genomic analyses via physical space deformation
Future Generation Computer Systems
High throughput and large capacity pipelined dynamic search tree on FPGA
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
A practical method for approximate subsequence search in DNA databases
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Exhaustive peptide searching using relations
BNCOD'07 Proceedings of the 24th British national conference on Databases
An experimental study of compressed indexing and local alignments of DNA
COCOA'07 Proceedings of the 1st international conference on Combinatorial optimization and applications
A hash trie filter method for approximate string matching in genomic databases
Applied Intelligence
An indexing scheme for fast and accurate chemical fingerprint database searching
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
ERA: efficient serial and parallel suffix tree construction for very long strings
Proceedings of the VLDB Endowment
Search-Optimized suffix-tree storage for biological applications
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Obtaining provably good performance from suffix trees in secondary storage
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
A novel indexing method for efficient sequence matching in large DNA database environment
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
On-line suffix tree construction with reduced branching
Journal of Discrete Algorithms
Information retrieval of sequential data in heterogeneous XML databases
AMR'05 Proceedings of the Third international conference on Adaptive Multimedia Retrieval: user, context, and feedback
Towards Efficient Searching on the Secondary Structure of Protein Sequences
Fundamenta Informaticae - Special issue ISMIS'05
Trying to outperform a well-known index with a sequential scan
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Efficient parallel construction of suffix trees for genomes larger than main memory
Proceedings of the 20th European MPI Users' Group Meeting
RACE: a scalable and elastic parallel system for discovering repeats in very long sequences
Proceedings of the VLDB Endowment
Efficient techniques on retrieving bio-information for active U-healthcare
Personal and Ubiquitous Computing
Hi-index | 0.00 |
Our aim is to develop new database technologies for the approximate matching of unstructured string data using indexes. We explore the potential of the suffix tree data structure in this context. We present a new method of building suffix trees, allowing us to build trees in excess of RAM size, which has hitherto not been possible. We show that this method performs in practice as well as the O(n) method of Ukkonen [70]. Using this method we build indexes for 200 Mb of protein and 300 Mbp of DNA, whose disk-image exceeds the available RAM. We show experimentally that suffix trees can be effectively used in approximate string matching with biological data. For a range of query lengths and error bounds the suffix tree reduces the size of the unoptimised O(mn) dynamic programming calculation required in the evaluation of string similarity, and the gain from indexing increases with index size. In the indexes we built this reduction is significant, and less than 0.3% of the expected matrix is evaluated. We detail the requirements for further database and algorithmic research to support efficient use of large suffix indexes in biological applications.