New indices for text: PAT Trees and PAT arrays
Information retrieval
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Fast algorithms for sorting and searching strings
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Reducing the space requirement of suffix trees
Software—Practice & Experience
An experimental study of an opportunistic index
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Succinct representations of lcp information and improvements in the compressed suffix arrays
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
The Enhanced Suffix Array and Its Applications to Genome Analysis
WABI '02 Proceedings of the Second International Workshop on Algorithms in Bioinformatics
LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Optimal Exact Strring Matching Based on Suffix Arrays
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Space-Economical Algorithms for Finding Maximal Unique Matches
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Finding Maximal Repetitions in a Word in Linear Time
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
On compressing and indexing data
On compressing and indexing data
Linear-time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Space efficient linear time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Simple linear work suffix array construction
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Practical methods for constructing suffix trees
The VLDB Journal — The International Journal on Very Large Data Bases
Fast Frequent String Mining Using Suffix Arrays
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Software—Practice & Experience
Construction of Aho Corasick automaton in linear time for integer alphabets
Information Processing Letters
Fast and space efficient string kernels using suffix arrays
ICML '06 Proceedings of the 23rd international conference on Machine learning
Suffix arrays: what are they good for?
ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
When indexing equals compression: Experiments with compressing suffix arrays and applications
ACM Transactions on Algorithms (TALG)
ACM Computing Surveys (CSUR)
Longest repeats with a block of k don't cares
Theoretical Computer Science
Linear work suffix array construction
Journal of the ACM (JACM)
Constructing large suffix trees on a computational grid
Journal of Parallel and Distributed Computing
Computing suffix links for suffix trees and arrays
Information Processing Letters
A taxonomy of suffix array construction algorithms
ACM Computing Surveys (CSUR)
An efficient, versatile approach to suffix sorting
Journal of Experimental Algorithmics (JEA)
Efficient token based clone detection with flexible tokenization
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Efficient token based clone detection with flexible tokenization
The 6th Joint Meeting on European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering: companion papers
The affix array data structure and its applications to RNA secondary structure analysis
Theoretical Computer Science
Computing Longest Previous Factor in linear time and applications
Information Processing Letters
International Journal of Bioinformatics Research and Applications
Improving suffix array locality for fast pattern matching on disk
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Counting suffix arrays and strings
Theoretical Computer Science
Fast profile matching algorithms – A survey
Theoretical Computer Science
Linear-Time Computation of Similarity Measures for Sequential Data
The Journal of Machine Learning Research
A space efficient solution to the frequent string mining problem for many databases
Data Mining and Knowledge Discovery
Spamming botnets: signatures and characteristics
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
An(other) Entropy-Bounded Compressed Suffix Tree
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Efficient String Mining under Constraints Via the Deferred Frequency Index
ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
An Online Algorithm for Finding the Longest Previous Factors
ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Fast and Adaptive Variable Order Markov Chain Construction
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
On-line construction of compact suffix vectors and maximal repeats
Theoretical Computer Science
A new method for indexing genomes using on-disk suffix trees
Proceedings of the 17th ACM conference on Information and knowledge management
Efficient multi-word expressions extractor using suffix arrays and related structures
Proceedings of the 2nd ACM workshop on Improving non english web searching
Speeding Up Pattern Matching by Text Sampling
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Efficient Algorithms for the Computational Design of Optimal Tiling Arrays
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Using Bloom Filters for Large Scale Gene Sequence Analysis in Haskell
PADL '09 Proceedings of the 11th International Symposium on Practical Aspects of Declarative Languages
PSISA: an algorithm for indexing and searching protein structure using suffix arrays
ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
Reducing Space Requirements for Disk Resident Suffix Arrays
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Broadword Computing and Fibonacci Code Speed Up Compressed Suffix Arrays
SEA '09 Proceedings of the 8th International Symposium on Experimental Algorithms
Permuted Longest-Common-Prefix Array
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Transformation of Suffix Arrays into Suffix Trees on the MPI Environment
RSFDGrC '07 Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Engineering a compressed suffix tree implementation
Journal of Experimental Algorithmics (JEA)
A Compressed Enhanced Suffix Array Supporting Fast String Matching
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Compressed Suffix Arrays for Massive Data
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Practical Algorithms for the Longest Common Extension Problem
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Faster entropy-bounded compressed suffix trees
Theoretical Computer Science
Kernel-based machine learning for fast text mining in R
Computational Statistics & Data Analysis
Information Processing Letters
Engineering a software tool for gene structure prediction in higher organisms
Information and Software Technology
Construction of Aho Corasick automaton in linear time for integer alphabets
Information Processing Letters
Information Processing Letters
Efficient and scalable indexing techniques for biological sequence data
BIRD'07 Proceedings of the 1st international conference on Bioinformatics research and development
Engineering a compressed suffix tree implementation
WEA'07 Proceedings of the 6th international conference on Experimental algorithms
Detecting duplicate video based on camera transitional behavior
ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Maximal phrases based analysis for prototyping online discussion forums postings
AdaptLRTtoND '09 Proceedings of the Workshop on Adaptation of Language Resources and Technology to New Domains
Sampled longest common prefix array
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Enhanced suffix arrays as language models: virtual k-testable languages
ICGI'10 Proceedings of the 10th international colloquium conference on Grammatical inference: theoretical results and applications
Sparse substring pattern set discovery using linear programming boosting
DS'10 Proceedings of the 13th international conference on Discovery science
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Computing matching statistics and maximal exact matches on compressed full-text indexes
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Toward optimal disk layout of genome scale suffix trees
SEAL'10 Proceedings of the 8th international conference on Simulated evolution and learning
ACM Transactions on Algorithms (TALG)
Lempel-Ziv factorization revisited
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Sparse and truncated suffix trees on variable-length codes
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Human motion classification and management based on mocap data analysis
J-HGBU '11 Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding
Optimal string mining under frequency constraints
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Theoretical and practical improvements on the RMQ-Problem, with applications to LCA and LCE
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
A New Efficient Data Structure for Storage and Retrieval of Multiple Biosequences
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Practical compressed suffix trees
SEA'10 Proceedings of the 9th international conference on Experimental Algorithms
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Construction of aho corasick automaton in linear time for integer alphabets
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
String matching with alphabet sampling
Journal of Discrete Algorithms
Searching for smallest grammars on large sequences and application to DNA
Journal of Discrete Algorithms
Bidirectional search in a string with wavelet trees and bidirectional matching statistics
Information and Computation
Counting suffix arrays and strings
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Efficient relaxed search in hierarchically clustered sequence datasets
Journal of Experimental Algorithmics (JEA)
Improving tweet stream classification by detecting changes in word probability
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Minimum Unique Substrings and Maximum Repeats
Fundamenta Informaticae - Theory that Counts: To Oscar Ibarra on His 70th Birthday
SEA'12 Proceedings of the 11th international conference on Experimental Algorithms
Computing lempel-ziv factorization online
MFCS'12 Proceedings of the 37th international conference on Mathematical Foundations of Computer Science
A comparison of index-based lempel-Ziv LZ77 factorization algorithms
ACM Computing Surveys (CSUR)
Computing regularities in strings: A survey
European Journal of Combinatorics
Machine translation without words through substring alignment
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Efficient computational design of tiling arrays using a shortest path approach
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Fast and practical algorithms for computing all the runs in a string
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Efficient computation of substring equivalence classes with suffix arrays
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
A new succinct representation of RMQ-information and improvements in the enhanced suffix array
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Efficient distributed computation of maximal exact matches
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Comparing DNA sequence collections by direct comparison of compressed text indexes
WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
Lightweight LCP construction for next-generation sequencing datasets
WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
Distributed string mining for high-throughput sequencing data
WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
Space-Efficient computation of maximal and supermaximal repeats in genome sequences
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Of motifs and goals: mining trajectory data
Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Parallel suffix array and least common prefix for the GPU
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalable string similarity search/join with approximate seeds and multiple backtracking
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Optimized relative Lempel-Ziv compression of genomes
ACSC '11 Proceedings of the Thirty-Fourth Australasian Computer Science Conference - Volume 113
Trends in suffix sorting: a survey of low memory algorithms
ACSC '12 Proceedings of the Thirty-fifth Australasian Computer Science Conference - Volume 122
Distributional phrasal paraphrase generation for statistical machine translation
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
On parsing optimality for dictionary-based text compression-the Zip case
Journal of Discrete Algorithms
Substring-based machine translation
Machine Translation
Viewing functions as token sequences to highlight similarities in source code
Science of Computer Programming
Suffix Array Construction in External Memory Using D-Critical Substrings
ACM Transactions on Information Systems (TOIS)
A Compressed Suffix Tree Based Implementation With Low Peak Memory Usage
Electronic Notes in Theoretical Computer Science (ENTCS)
Hi-index | 0.01 |
The suffix tree is one of the most important data structures in string processing and comparative genomics. However, the space consumption of the suffix tree is a bottleneck in large scale applications such as genome analysis. In this article, we will overcome-this obstacle. We will show how every algorithm that uses a suffix tree as data structure can systematically be replaced with an algorithm that uses an enhanced suffix array and solves the same problem in the same time complexity. The generic name enhanced suffix array stands for data structures consisting of the suffix array and additional tables. Our new algorithms are not only more space efficient than previous ones, but they are also faster and easier to implement.