Analysis of a very large web search engine query log
ACM SIGIR Forum
Efficient phrase querying with an auxiliary index
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Fast phrase querying with combined indexes
ACM Transactions on Information Systems (TOIS)
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Efficient document retrieval in main memory
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Index compression is good, especially for random access
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Challenges in building large-scale information retrieval systems: invited talk
Proceedings of the Second ACM International Conference on Web Search and Data Mining
#TwitterSearch: a comparison of microblog search and web search
Proceedings of the fourth ACM international conference on Web search and data mining
Structured index organizations for high-throughput text querying
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
To index or not to index: time-space trade-offs in search engines with positional ranking functions
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Reordering an index to speed query processing without loss of effectiveness
Proceedings of the Seventeenth Australasian Document Computing Symposium
Hi-index | 0.00 |
A large proportion of search engine queries contain phrases,namely a sequence of adjacent words. In this paper, we propose to use flat position index (a.k.a schema-independent index) for phrase query evaluation. In the flat position index, the entire document collection is viewed as a huge sequence of tokens. Each token is represented by one flat position, which is a unique position offset from the beginning of the collection. Each indexed term is associated with a list of the flat positions about that term in the sequence. To recover DocID from flat positions efficiently, we propose a novel cache sensitive look-up table (CSLT), which is much faster than existing search algorithms. Experiments on TREC GOV2 data collection show that flat position index can reduce the index size and speed up phrase querying substantially, compared with traditional word-level index.