Information retrieval
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Efficient implementation of suffix trees
Software—Practice & Experience
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Inverted files versus signature files for text indexing
ACM Transactions on Database Systems (TODS)
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Reducing the space requirement of suffix trees
Software—Practice & Experience
Indexing and Retrieval for Genomic Databases
IEEE Transactions on Knowledge and Data Engineering
A Very Fast String Matching Algorithm for Small Alphabeths and Long Patterns (Extended Abstract)
CPM '98 Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching
Algebraic Signatures for Scalable Distributed Data Structures
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
n-gram/2L: a space and time efficient two-level n-gram inverted index structure
VLDB '05 Proceedings of the 31st international conference on Very large data bases
The ed-tree: an index for large DNA sequence databases
SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
Genome-scale disk-based suffix tree indexing
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
SeFS: Unleashing the power of full-text search on file systems
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Practical suffix tree construction
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Better external memory suffix array construction
Journal of Experimental Algorithmics (JEA)
Indexing DNA sequences using q-grams
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Efficiently encoding term co-occurrences in inverted indexes
Proceedings of the 20th ACM international conference on Information and knowledge management
An approach for indexing file names in a directory
Proceedings of the 13th International Conference on Computer Systems and Technologies
Hi-index | 0.01 |
AS-Index is a new index structure for exact string search in disk resident databases. It uses hashing, unlike known alternatives, whether baesd on trees or tries. It typically indexes every n-gram in the database, though non-dense indexing is also possible. The hash function uses the algebraic signatures of all n-grams. Use of hashing provides for constant index access time for arbitrarily long patterns, unlike other structures whose search cost is at best logarithmic. The storage overhead of AS-Index is basically 500 - 600%, similar to that of alternatives or smaller. We show the index structure, our use of algebraic signatures and the search algorithm. We discuss the design trade-offs and present the theoretical and experimental performance analysis. We compare the AS-Index to main alternatives. We conclude that AS-Index is an attractive structure and we indicate directions for future work.