Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Text algorithms
A fast string searching algorithm
Communications of the ACM
Practical Techniques for Searches on Encrypted Data
SP '00 Proceedings of the 2000 IEEE Symposium on Security and Privacy
Algebraic Signatures for Scalable Distributed Data Structures
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
n-gram/2L: a space and time efficient two-level n-gram inverted index structure
VLDB '05 Proceedings of the 31st international conference on Very large data bases
An Encrypted, Content Searchable Scalable Distributed Data Structure
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Privacy-preserving indexing of documents on the network
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Privacy preserving keyword searches on remote encrypted data
ACNS'05 Proceedings of the Third international conference on Applied Cryptography and Network Security
Reference-based alignment in large sequence databases
Proceedings of the VLDB Endowment
Performance improvement of join queries through algebraic signatures
International Journal of Intelligent Information and Database Systems
WHAM: a high-throughput sequence alignment method
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A generic framework for efficient and effective subsequence retrieval
Proceedings of the VLDB Endowment
WHAM: A High-Throughput Sequence Alignment Method
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
We propose a novel string search algorithm for data stored once and read many times. Our search method combines the sublinear traversal of the record (as in Boyer Moore or Knuth-Morris-Pratt) with the agglomeration of parts of the record and search pattern into a single character -- the algebraic signature -- in the manner of Karp-Rabin. Our experiments show that our algorithm is up to seventy times faster for DNA data, up to eleven times faster for ASCII, and up to a six times faster for XML documents compared with an implementation of Boyer-Moore. To obtain this speed-up, we store records in encoded form, where each original character is replaced with an algebraic signature. Our method applies to records stored in databases in general and to distributed implementations of a Database As Service (DAS) in particular. Clients send records for insertion and search patterns already in encoded form and servers never operate on records in clear text. No one at a node can involuntarily discover the content of the stored data.