The design and analysis of spatial data structures
The design and analysis of spatial data structures
The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
New indices for text: PAT Trees and PAT arrays
Information retrieval
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
A fully-dynamic data structure for external substring search
STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
On sorting strings in external memory (extended abstract)
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Multidimensional access methods
ACM Computing Surveys (CSUR)
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Fast string searching in secondary storage: theoretical developments and experimental results
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Direct spatial search on pictorial databases using packed R-trees
SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
ACM Transactions on Database Systems (TODS)
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
Journal of the ACM (JACM)
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Understanding and Deploying LDAP Directory Services
Understanding and Deploying LDAP Directory Services
The K-D-B-tree: a search structure for large multidimensional dynamic indexes
SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Multi-Dimensional Substring Selectivity Estimation
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Filter Trees for Managing Spatial Data over a Range of Size Granularities
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Two-dimensional substring indexing
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Database indexing for large DNA and protein sequence collections
The VLDB Journal — The International Journal on Very Large Data Bases
A Database Index to Large Biological Sequences
Proceedings of the 27th International Conference on Very Large Data Bases
A Fast Index for Semistructured Data
Proceedings of the 27th International Conference on Very Large Data Bases
On effective classification of strings with wavelets
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An Index Structure for Pattern Similarity Searching in DNA Microarray Data
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Generalized substring selectivity estimation
Journal of Computer and System Sciences - Special issue on PODS 2000
Two-dimensional substring indexing
Journal of Computer and System Sciences - Special issu on PODS 2001
A compressed accessibility map for XML
ACM Transactions on Database Systems (TODS)
On the Use of Wavelet Decomposition for String Classification
Data Mining and Knowledge Discovery
Indexing mixed types for approximate retrieval
VLDB '05 Proceedings of the 31st international conference on Very large data bases
BeTrIS: an index system for MPEG-7 streams
EURASIP Journal on Applied Signal Processing
Real-valued feature indexing for music databases
Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
Scalable multi-feature index structure for music databases
Information Sciences: an International Journal
Efficient and scalable indexing techniques for biological sequence data
BIRD'07 Proceedings of the 1st international conference on Bioinformatics research and development
Estimating the number of substring matches in long string databases
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Discovering consensus patterns in biological databases
VDMB'06 Proceedings of the First international conference on Data Mining and Bioinformatics
Adapting the pyramid technique for indexing ontological data
ISCIS'06 Proceedings of the 21st international conference on Computer and Information Sciences
Clustering large scale of XML documents
GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing
Hi-index | 0.00 |
As databases have expanded in scope from storing purely business data to include XML documents, product catalogs, e-mail messages, and directory data, it has become increasingly important to search databases based on wild-card string matching: prefix matching, for example, is more common (and useful) than exact matching, for such data. In many cases, matches need to be on multiple attributes/dimensions, with correlations between the dimensions. Traditional multi-dimensional index structures, designed with (fixed length) numeric data in mind, are not suitable for matching unbounded length string data.In this paper, we describe a general technique for adapting a multi-dimensional index structure for wild-card indexing of unbounded length string data. The key ideas are (a) a carefully developed mapping function from strings to rational numbers, (b) representing an unbounded length string in an index leaf page by a fixed length offset to an external key, and (c) storing multiple elided tries, one per dimension, in an index page to prune search during traversal of index pages. These basic ideas affect all index algorithms. In this paper, we present efficient algorithms for different types of string matching.While our technique is applicable to a wide range of multi-dimensional index structures, we instantiate our generic techniques by adapting the 2-dimensional R-tree to string data. We demonstrate the space effectiveness and time benefits of using the string R-tree both analytically and experimentally.