Document Clustering Using Incremental and Pairwise Approaches

Authors:
Tien Tran;Richi Nayak;Peter Bruza
Affiliations:
Information Technology, Queensland University of Technology, Brisbane, Australia;Information Technology, Queensland University of Technology, Brisbane, Australia;Information Technology, Queensland University of Technology, Brisbane, Australia
Venue:
Focused Access to XML Documents
Year:
2008

Citing 13
Cited 1

Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
XClust: clustering XML schemas for effective integration

Proceedings of the eleventh international conference on Information and knowledge management
Latent Semantic Kernels

Journal of Intelligent Information Systems
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure

IEEE Transactions on Knowledge and Data Engineering
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
Learning the Kernel Matrix for XML Document Clustering

EEE '05 Proceedings of the 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE'05) on e-Technology, e-Commerce and e-Service
Hierarchical Clustering Algorithms for Document Datasets

Data Mining and Knowledge Discovery
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Focused Access to XML Documents: 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007 Dagstuhl Castle, Germany, December 17-19, 2007. Selected Papers

Focused Access to XML Documents: 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007 Dagstuhl Castle, Germany, December 17-19, 2007. Selected Papers
Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach

Focused Access to XML Documents
Efficient Clustering of Structured Documents Using Graph Self-Organizing Maps

Focused Access to XML Documents
XCLS: a fast and effective clustering algorithm for heterogenous XML documents

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
An intelligent grading system using heterogeneous linguistic resources

IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning

FXProj: a fuzzy XML documents projected clustering based on structure and content

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the experiments and results of a clustering approach for clustering of the large Wikipedia dataset in the INEX 2007 Document Mining Challenge. The clustering approach employed makes use of an incremental clustering method and a pairwise clustering method. The approach enables us to perform the clustering task on a large dataset by first reducing the dimension of the dataset to an undefined number of clusters using the incremental method. The lower-dimension dataset is then clustered to a required number of clusters using the pairwise method. In this way, clustering of the large number of documents is performed successfully and the accuracy of the clustering solution is achieved.