Generating unambiguous URL clusters from web search

Authors:
G. Smith;T. Brailsford;C. Donner;D. Hooijmaijers;M. Truran;J. Goulding;H. Ashman
Affiliations:
University of South Australia;University of Nottingham;University of South Australia;University of South Australia;University of Teesside;University of Nottingham;University of South Australia
Venue:
Proceedings of the 2009 workshop on Web Search Click Data
Year:
2009

Citing 8
Cited 3

Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Query clustering using user logs

ACM Transactions on Information Systems (TOIS)
Co-active intelligence for image retrieval

Proceedings of the 13th annual ACM international conference on Multimedia
Random walks on the click graph

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Ambiguous queries: test collections need more sense

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised Discrimination of Person Names in Web Contexts

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Proceedings of the 4th International Workshop on Semantic Evaluations

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
UMND1: unsupervised word sense disambiguation using contextual semantic relatedness

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations

Implicit association via crowd-sourced coselection

Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Discovering semantic associations from web search interactions

Proceedings of the 24th ACM Conference on Hypertext and Social Media
Finding synonyms and other semantically-similar terms from coselection data

AWC '13 Proceedings of the First Australasian Web Conference - Volume 144

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports on the generation of unambiguous clusters of URLs from clickthrough data from the MSN search query log excerpt (the RFP 2006 dataset). Selections (clickthroughs) by a single user from a single query can be assumed to have some mutual semantic relevance, and the URLs coselected in this way can be aggregated to form single-sense clusters. When the graphs for a single term separate into distinct clusters, the semantics of the distinct clusters can be interpreted as disambiguated aggregations of URLs. This principle had been tested on smaller and more constrained datasets previously, and this paper reports on findings from applying a method based on the principle to the RFP 2006 dataset. This paper evaluates the proposed coselection method for generating single-sense clusters against two other methods, with varying parameters. The evaluation is done both with a human evaluation to determine the quality of the clusters generated by the different methods, and by a simple "edit distance" analysis to determine the content difference of the methods. The main questions addressed are i) whether it is feasible to generate single-sense / sense-coherent clusters, and ii) whether, in a closed world, it would be feasible to discover ambiguous terms. The experimentation showed that sense-coherent clusters were found and further indicated that ambiguous terms could be detected from observing small overlap between large clusters.