Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
XClust: clustering XML schemas for effective integration
Proceedings of the eleventh international conference on Information and knowledge management
Journal of Intelligent Information Systems
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure
IEEE Transactions on Knowledge and Data Engineering
Learning the Kernel Matrix for XML Document Clustering
EEE '05 Proceedings of the 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE'05) on e-Technology, e-Commerce and e-Service
Hierarchical Clustering Algorithms for Document Datasets
Data Mining and Knowledge Discovery
COMA: a system for flexible combination of schema matching approaches
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Focused Access to XML Documents: 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007 Dagstuhl Castle, Germany, December 17-19, 2007. Selected Papers
Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach
Focused Access to XML Documents
Efficient Clustering of Structured Documents Using Graph Self-Organizing Maps
Focused Access to XML Documents
XCLS: a fast and effective clustering algorithm for heterogenous XML documents
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
An intelligent grading system using heterogeneous linguistic resources
IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
FXProj: a fuzzy XML documents projected clustering based on structure and content
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Hi-index | 0.00 |
This paper presents the experiments and results of a clustering approach for clustering of the large Wikipedia dataset in the INEX 2007 Document Mining Challenge. The clustering approach employed makes use of an incremental clustering method and a pairwise clustering method. The approach enables us to perform the clustering task on a large dataset by first reducing the dimension of the dataset to an undefined number of clusters using the incremental method. The lower-dimension dataset is then clustered to a required number of clusters using the pairwise method. In this way, clustering of the large number of documents is performed successfully and the accuracy of the clustering solution is achieved.