A transduction-based approach to fuzzy clustering, relevance ranking and cluster label generation on web search results

Authors:
Takazumi Matsumoto;Edward Hung
Affiliations:
Department of Computing, Hong Kong Polytechnic University, Hung Hom, Hong Kong;Department of Computing, Hong Kong Polytechnic University, Hung Hom, Hong Kong
Venue:
Journal of Intelligent Information Systems
Year:
2012

Citing 16
Cited 0

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Findex: search result categories help users when document ranking fails

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A personalized search engine based on web-snippet hierarchical clustering

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
A Concept-Driven Algorithm for Clustering Search Results

IEEE Intelligent Systems
Standardized Evaluation Method for Web Clustering Results

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Clustering versus faceted categories for information exploration

Communications of the ACM - Supporting exploratory search
A New Web Search Result Clustering based on True Common Phrase Label Discovery

CIMCA '06 Proceedings of the International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce
A new algorithm for clustering search results

Data & Knowledge Engineering
A Novel Method for Hierarchical Clustering of Search Results

WI-IATW '07 Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
Search engine user behaviour: How can users be guided to quality content?

Information Services and Use - ICSTI 2007 and 2008
A survey of Web clustering engines

ACM Computing Surveys (CSUR)
Carrot2 and language properties in web search results clustering

AWIC'03 Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence
Ranking categories for web search

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Cluster generation and cluster labelling for web snippets: a fast and accurate hierarchical solution

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper details a modular, self-contained web search results clustering system that enhances search results by (i) performing clustering on lists of web documents returned by queries to search engines, and (ii) ranking the results and labeling the resulting clusters, by using a calculated relevance value as a degree of membership to clusters. In addition, we demonstrate an external evaluation method based on precision for comparing fuzzy clustering techniques, as well as internal measures suitable for working on non-training data. The built-in label generator uses the membership degrees and relevance values to weight the most relevant results more heavily. The membership degrees of documents to fuzzy clusters also facilitate effective detection and removal of overly similar clusters. To achieve this, our transduction-based clustering algorithm (TCA) and its fuzzy counterpart (FTCA) employ a transduction-based relevance model (TRM) to consider local relationships between each web document. Results from testing on five different real-world and synthetic datasets results show favorable results compared to established label-based clustering algorithms Suffix Tree Clustering (STC) and Lingo.