The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Partial-match retrieval using indexed descriptor files
Communications of the ACM
On improving the worst case running time of the Boyer-Moore string matching algorithm
Communications of the ACM
A fast string searching algorithm
Communications of the ACM
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Implementation of the substring test by hashing
Communications of the ACM
Exact and approximate membership testers
STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Some classes of multilevel relational structures
PODS '86 Proceedings of the fifth ACM SIGACT-SIGMOD symposium on Principles of database systems
On the expressive power of the extended relational algebra for the unnormalized relational model
PODS '87 Proceedings of the sixth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Algorithms for string searching
ACM SIGIR Forum
A new approach to text searching
Communications of the ACM
From text to hypertext by indexing
ACM Transactions on Information Systems (TOIS)
Fast text searching for regular expressions or automaton searching on tries
Journal of the ACM (JACM)
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Optimization of restricted searches in web directories using hybrid data structures
ECIR'03 Proceedings of the 25th European conference on IR research
Hi-index | 0.02 |
We present several algorithms to search data bases that consist of text. The algorithms apply mostly to very large data bases that are difficult to structure.We describe algorithms which search the original data base without transformation and hence could be used as general text searching algorithms. We also describe algorithms requiring pre-processing, the best of them achieving a logarithmic behaviour. These efficient algorithms solve the "plagiarism" problem among n papers.The problem of misspellings, ambiguous spellings, simple errors, endings, positional information, etc is nicely treated using signature functions.