Competitive learning algorithms for vector quantization
Neural Networks
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
On the merits of building categorization systems by supervised clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Classifying text documents by associating terms with text categories
ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Frequent term-based text clustering
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
BIDE: Efficient Mining of Frequent Closed Sequences
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Text document clustering based on frequent word meaning sequences
Data & Knowledge Engineering
Improving effectiveness of mutual information for substantival multiword expression extraction
Expert Systems with Applications: An International Journal
Document clustering based on maximal frequent sequences
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Semi-supervised locally discriminant projection for classification and recognition
Knowledge-Based Systems
Finding association rules in semantic web data
Knowledge-Based Systems
A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data
Knowledge-Based Systems
Decision support for improved service effectiveness using domain aware text mining
Knowledge-Based Systems
Semantically-grounded construction of centroids for datasets with textual attributes
Knowledge-Based Systems
A lattice-based approach for mining most generalization association rules
Knowledge-Based Systems
Clustering Software Components for Component Reuse and Program Restructuring
Proceedings of the Second International Conference on Innovative Computing and Cloud Computing
Hi-index | 0.00 |
Frequent itemset originates from association rule mining. Recently, it has been applied in text mining such as document categorization, clustering, etc. In this paper, we conduct a study on text clustering using frequent itemsets. The main contribution of this paper is three manifolds. First, we present a review on existing methods of document clustering using frequent patterns. Second, a new method called Maximum Capturing is proposed for document clustering. Maximum Capturing includes two procedures: constructing document clusters and assigning cluster topics. We develop three versions of Maximum Capturing based on three similarity measures. We propose a normalization process based on frequency sensitive competitive learning for Maximum Capturing to merge cluster candidates into predefined number of clusters. Third, experiments are carried out to evaluate the proposed method in comparison with CFWS, CMS, FTC and FIHC methods. Experiment results show that in clustering, Maximum Capturing has better performances than other methods mentioned above. Particularly, Maximum Capturing with representation using individual words and similarity measure using asymmetrical binary similarity achieves the best performance. Moreover, topics produced by Maximum Capturing distinguished clusters from each other and can be used as labels of document clusters.