Algorithms for approximate string matching
Information and Control
Fast text searching: allowing errors
Communications of the ACM
On the hardness of approximating minimization problems
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Finding approximate matches in large lexicons
Software—Practice & Experience
One-time complete indexing of text: theory and practice
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
A fast bit-vector algorithm for approximate string matching based on dynamic programming
Journal of the ACM (JACM)
Adaptive set intersections, unions, and differences
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
A hash code method for detecting and correcting spelling errors
Communications of the ACM
Adaptive correction of program statements
Communications of the ACM
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
An Approach to Designing Very Fast Approximate String Matching Algorithms
IEEE Transactions on Knowledge and Data Engineering
Efficient Index Structures for String Databases
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate String-Matching over Suffix Trees
CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
Filtration with q-Samples in Approximate String Matching
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Approximate Multiple Strings Search
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
On Using q-Gram Locations in Approximate String Matching
ESA '95 Proceedings of the Third Annual European Symposium on Algorithms
A Fast Algorithm on Average for All-Against-All Sequence Matching
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Efficient single-pass index construction for text databases
Journal of the American Society for Information Science and Technology
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
A New Indexing Method for Approximate Search in Text Databases
CIT '05 Proceedings of the The Fifth International Conference on Computer and Information Technology
Fast Approximate Search in Large Dictionaries
Computational Linguistics
A Primitive Operator for Similarity Joins in Data Cleaning
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Exploring distributional similarity based models for query spelling correction
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
Cross-lingual query suggestion using query logs of different languages
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Query suggestion based on user landing pages
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Context-aware query suggestion by mining click-through and session data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Ed-Join: an efficient algorithm for similarity joins with edit distance constraints
Proceedings of the VLDB Endowment
Query suggestion using hitting time
Proceedings of the 17th ACM conference on Information and knowledge management
Query suggestions using query-flow graphs
Proceedings of the 2009 workshop on Web Search Click Data
Efficient interactive fuzzy keyword search
Proceedings of the 18th international conference on World wide web
Fast error-tolerant search on very large texts
Proceedings of the 2009 ACM symposium on Applied Computing
Efficient Merging and Filtering Algorithms for Approximate String Searches
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Extending autocompletion to tolerate errors
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Faster and Space-Optimal Edit Distance "1" Dictionary
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Indexing Variable Length Substrings for Exact and Approximate Matching
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Aging effects on query flow graphs for query suggestion
Proceedings of the 18th ACM conference on Information and knowledge management
Optimal rare query suggestion with implicit user feedback
Proceedings of the 19th international conference on World wide web
Indexing methods for approximate dictionary searching: Comparative analysis
Journal of Experimental Algorithmics (JEA)
Efficient similarity joins for near-duplicate detection
ACM Transactions on Database Systems (TODS)
Query suggestions in the absence of query logs
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Automatic boolean query suggestion for professional search
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
On the least cost for proximity searching in metric spaces
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Query recommendation using query logs in search engines
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Random texts exhibit Zipf's-law-like word frequency distribution
IEEE Transactions on Information Theory
Supporting efficient top-k queries in type-ahead search
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
We consider the problem of fuzzy full-text search in large text collections, that is, full-text search which is robust against errors both on the side of the query as well as on the side of the documents. Standard inverted-index techniques work extremely well for ordinary full-text search but fail to achieve interactive query times (below 100 milliseconds) for fuzzy full-text search even on moderately-sized text collections (above 10 GBs of text). We present new preprocessing techniques that achieve interactive query times on large text collections (100 GB of text, served by a single machine). We consider two similarity measures, one where the query terms match similar terms in the collection (e.g., algorithm matches algoritm or vice versa) and one where the query terms match terms with a similar prefix in the collection (e.g., alori matches algorithm). The latter is important when we want to display results instantly after each keystroke (search as you type). All algorithms have been fully integrated into the CompleteSearch engine.