Fast hashing of variable-length text strings
Communications of the ACM
Introduction to algorithms
The design and implementation of a log-structured file system
ACM Transactions on Computer Systems (TOCS)
Algorithms in C++
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Building a distributed full-text index for the Web
Proceedings of the 10th international conference on World Wide Web
Rank aggregation methods for the Web
Proceedings of the 10th international conference on World Wide Web
Static index pruning for information retrieval systems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Effective site finding using link anchor information
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Stable algorithms for link analysis
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Template detection via data mining and its applications
Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Fast Incremental Indexing for Full-Text Information Retrieval
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
WWW '03 Proceedings of the 12th international conference on World Wide Web
WWW '03 Proceedings of the 12th international conference on World Wide Web
Analysis of anchor text for web search
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Efficient single-pass index construction for text databases
Journal of the American Society for Information Science and Technology
Mining anchor text for query refinement
Proceedings of the 13th international conference on World Wide Web
Optimized query execution in large search engines with global page ordering
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Sampling search-engine results
WWW '05 Proceedings of the 14th international conference on World Wide Web
Static score bucketing in inverted indexes
Proceedings of the 14th ACM international conference on Information and knowledge management
Using annotations in enterprise search
Proceedings of the 15th international conference on World Wide Web
Navigating the intranet with high precision
Proceedings of the 16th international conference on World Wide Web
Just in time indexing for up to the second search
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Relaxation in text search using taxonomies
Proceedings of the VLDB Endowment
A search-based method for forecasting ad impression in contextual advertising
Proceedings of the 18th international conference on World wide web
Caching search engine results over incremental indices
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Efficiently encoding term co-occurrences in inverted indexes
Proceedings of the 20th ACM international conference on Information and knowledge management
Indexing shared content in information retrieval systems
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Hi-index | 0.00 |
There has been a substantial amount of research on high-performance algorithms for constructing an inverted text index. However, constructing the inverted index in a intranet search engine is only the final step in a more complicated index build process. Among other things, this process requires an analysis of all the data being indexed to compute measures like PageRank. The time to perform this global analysis step is significant compared to the time to construct the inverted index, yet it has not received much attention in the research literature. In this paper, we describe how the use of slightly outdated information from global analysis and a fast index construction algorithm based on radix sorting can be combined in a novel way to significantly speed up the index build process without sacrificing search quality.