A K-means approach based on concept hierarchical tree for search results clustering

Authors:
Peng Jiang;Chunxia Zhang;Guisuo Guo;Zhendong Niu;Dongping Gao
Affiliations:
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Venue:
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Year:
2009

Citing 10
Cited 0

Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Grouper: a dynamic clustering interface to Web search results

WWW '99 Proceedings of the eighth international conference on World Wide Web
On Clustering Validation Techniques

Journal of Intelligent Information Systems
KAON - Towards a Large Scale Semantic Web

EC-WEB '02 Proceedings of the Third International Conference on E-Commerce and Web Technologies
Clustering web documents: a phrase-based method for grouping search engine results

Clustering web documents: a phrase-based method for grouping search engine results
Ontologies Improve Text Document Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Learning to cluster web search results

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Standardized Evaluation Method for Web Clustering Results

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
QC4: a clustering evaluation method

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Search results clustering aims to facilitate users' information retrieval process and query refinement by online grouping similar documents returned from the search engine. It has stringent requirements on performance and meaningful cluster labels. Thus, most existing clustering algorithms such as K-means and agglomerative hierarchical clustering cannot be directly applied to the task of online search results clustering. In this paper, we propose a K-means approach based on concept hierarchical tree to cluster search results. This algorithm not only overcomes weaknesses of the classic K-means method: the results produced depend on the initial seeds and the parameter k is often unknown, but also satisfies the requirements of online search results clustering. Our method utilizes the semantic relation among documents by mapping terms to concepts in the concept hierarchical tree, which can be constructed by WordNet. We have developed a meta-search and clustering system based on our approach, followed by using an impersonal and repeatable evaluation solution. Experimental results indicate that our proposed algorithm is effective and suitable in performing the task of clustering search results.