A Maximal Frequent Itemset Approach for Web Document Clustering

Authors:
Ling Zhuang;Honghua Dai
Affiliations:
Deakin University;Deakin University
Venue:
CIT '04 Proceedings of the The Fourth International Conference on Computer and Information Technology
Year:
2004

Citing 0
Cited 2

An efficient algorithm for topic ranking and modeling topic evolution

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
An optimized k-means algorithm of reducing cluster intra-dissimilarity for document clustering

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

To efficiently and yet accurately cluster web documents is of great interests to web users and is a key component of the searching accuracy of a web search engine. To achieve this, this paper introduces a new approach for the clustering of web documents, which is called Maximal Frequent Item-set(MFI) approach. Iterative clustering algorithms, such as K-means and Expectation-Maximization (EM), are sensitive to their initial conditions. MFI approach firstly locates the center points of high density clusters precisely. These center points then are used as initial points for the K-means algorithm. Our experimental results tested on 3 web document sets show that our MFI approach outperforms the other methods we compared in most cases, particularly in the case of large number of categories in web document sets.