Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
A case study in web search using TREC algorithms
Proceedings of the 10th international conference on World Wide Web
Engineering a multi-purpose test collection for web retrieval experiments
Information Processing and Management: an International Journal
Documents and queries as random variables: History and implications: Research Articles
Journal of the American Society for Information Science and Technology
Proceedings of the 16th international conference on World Wide Web
Using similarity links as shortcuts to relevant web pages
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Test theory for evaluating reliability of IR test collections
Information Processing and Management: an International Journal
Is Wikipedia link structure different?
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Correlation of Term Count and Document Frequency for Google N-Grams
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Relevance propagation model for large hypertext document collections
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
A systematic study of parameter correlations in large scale duplicate document detection
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
A path-based approach for web page retrieval
World Wide Web
Hi-index | 0.00 |
We measure the WT10g test collection, used in the TREC-9 and TREC 2001 Web Tracks, and the .GOV test collection used in the TREC 2002 Web and Interactive Tracks, with common measures used in the web topology community, in order to see if these collections "look like" the web. This is not an idle question; characteristics of the web, such as power law relationships, diameter, and connected components have all been observed within the scope of general web crawls, constructed by blindly following links. The .GOV collection is a fairly straightforward 18GB crawl of sites in the .gov domain. In contrast, WT10g was carved out from a much larger crawl specifically to be a web search test collection within the reach of university researchers. Do such collections retain the properties of the larger web? In the case of WT10g and .GOV, yes.