A new approach to search result clustering and labeling

Authors:
Anil Turel;Fazli Can
Affiliations:
Bilkent Information Retrieval Group Computer Engineering Department, Bilkent University, Bilkent, Ankara, Turkey;Bilkent Information Retrieval Group Computer Engineering Department, Bilkent University, Bilkent, Ankara, Turkey
Venue:
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Year:
2011

Citing 10
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases

ACM Transactions on Database Systems (TODS)
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Efficiency and effectiveness of query processing in cluster-based retrieval

Information Systems
Bilkent news portal: a personalizable system with new event detection and tracking capabilities

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A survey of Web clustering engines

ACM Computing Surveys (CSUR)
Mobile information retrieval with search results clustering: Prototypes and evaluations

Journal of the American Society for Information Science and Technology
A semantic similarity metric combining features and intrinsic information content

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Search engines present query results as a long ordered list of web snippets divided into several pages. Post-processing of retrieval results for easier access of desired information is an important research problem. In this paper, we present a novel search result clustering approach to split the long list of documents returned by search engines into meaningfully grouped and labeled clusters. Our method emphasizes clustering quality by using cover coefficient-based and sequential k-means clustering algorithms. A cluster labeling method based on term weighting is also introduced for reflecting cluster contents. In addition, we present a new metric that employs precision and recall to assess the success of cluster labeling. We adopt a comparative strategy to derive the relative performance of the proposed method with respect to two prominent search result clustering methods: Suffix Tree Clustering and Lingo. Experimental results in the publicly available AMBIENT and ODP-239 datasets show that our method can successfully achieve both clustering and labeling tasks.