Agglomerative clustering of a search engine query log
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Query clustering using user logs
ACM Transactions on Information Systems (TOIS)
Co-active intelligence for image retrieval
Proceedings of the 13th annual ACM international conference on Multimedia
Random walks on the click graph
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Ambiguous queries: test collections need more sense
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised Discrimination of Person Names in Web Contexts
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Proceedings of the 4th International Workshop on Semantic Evaluations
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
UMND1: unsupervised word sense disambiguation using contextual semantic relatedness
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Implicit association via crowd-sourced coselection
Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Discovering semantic associations from web search interactions
Proceedings of the 24th ACM Conference on Hypertext and Social Media
Finding synonyms and other semantically-similar terms from coselection data
AWC '13 Proceedings of the First Australasian Web Conference - Volume 144
Hi-index | 0.00 |
This paper reports on the generation of unambiguous clusters of URLs from clickthrough data from the MSN search query log excerpt (the RFP 2006 dataset). Selections (clickthroughs) by a single user from a single query can be assumed to have some mutual semantic relevance, and the URLs coselected in this way can be aggregated to form single-sense clusters. When the graphs for a single term separate into distinct clusters, the semantics of the distinct clusters can be interpreted as disambiguated aggregations of URLs. This principle had been tested on smaller and more constrained datasets previously, and this paper reports on findings from applying a method based on the principle to the RFP 2006 dataset. This paper evaluates the proposed coselection method for generating single-sense clusters against two other methods, with varying parameters. The evaluation is done both with a human evaluation to determine the quality of the clusters generated by the different methods, and by a simple "edit distance" analysis to determine the content difference of the methods. The main questions addressed are i) whether it is feasible to generate single-sense / sense-coherent clusters, and ii) whether, in a closed world, it would be feasible to discover ambiguous terms. The experimentation showed that sense-coherent clusters were found and further indicated that ambiguous terms could be detected from observing small overlap between large clusters.