Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient phrase querying with an auxiliary index
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Web-scale information extraction in knowitall: (preliminary results)
Proceedings of the 13th international conference on World Wide Web
Fast phrase querying with combined indexes
ACM Transactions on Information Systems (TOIS)
A search engine for natural language applications
WWW '05 Proceedings of the 14th international conference on World Wide Web
Journal of the ACM (JACM)
Optimizing scoring functions and indexes for proximity search in type-annotated corpora
Proceedings of the 15th international conference on World Wide Web
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Lightweight natural language text compression
Information Retrieval
ACM Computing Surveys (CSUR)
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Heavy-tailed distributions and multi-keyword queries
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
Data extraction from the web using wild card queries
Proceedings of the 18th ACM conference on Information and knowledge management
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Hi-index | 0.00 |
Many existing indexes on text work at the document granularity and are not effective in answering the class of queries where the desired answer is only a term or a phrase. In this paper, we study some of the index structures that are capable of answering the class of queries referred to here as wild card queries and perform an analysis of their performance. Our experimental results on a large class of queries from different sources (including query logs and parse trees) and with various datasets reveal some of the performance barriers of these indexes. We then present Word Permuterm Index (WPI) which is an adaptation of the permuterm index for natural language text applications and show that this index supports a wide range of wild card queries, is quick to construct and is highly scalable. Our experimental resultS comparing WPI to alternative methods on a wide range oF wild card queries show a few orders of magnitude performancE improvements for WPI while the memory usage is kept the same for all compared systems.