WebACE: a Web agent for document categorization and exploration
AGENTS '98 Proceedings of the second international conference on Autonomous agents
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Mining Text Data: Special Features and Patterns
Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Frequent term-based text clustering
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A matrix density based algorithm to hierarchically co-cluster documents and words
WWW '03 Proceedings of the 12th international conference on World Wide Web
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
Clustering binary data streams with K-means
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Fuzzy data mining for interesting generalized association rules
Fuzzy Sets and Systems - Theme: Learning and modeling
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Document clustering by concept factorization
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Topic discovery based on text mining techniques
Information Processing and Management: an International Journal
Towards effective document clustering: A constrained K-means based approach
Information Processing and Management: an International Journal
Hierarchical Document Clustering Using Fuzzy Association Rule Mining
ICICIC '08 Proceedings of the 2008 3rd International Conference on Innovative Computing Information and Control
Document clustering using nonnegative matrix factorization
Information Processing and Management: an International Journal
Classification of skewed and homogenous document corpora with class-based and corpus-based keywords
KI'06 Proceedings of the 29th annual German conference on Artificial intelligence
Hi-index | 0.00 |
As text documents are explosively increasing in the Internet, the process of hierarchical document clustering has been proven to be useful for grouping similar documents for versatile applications. However, most document clustering methods still suffer from challenges in dealing with the problems of high dimensionality, scalability, accuracy, and meaningful cluster labels. In this paper, we will present an effective Fuzzy Frequent Itemset-Based Hierarchical Clustering (F^2IHC) approach, which uses fuzzy association rule mining algorithm to improve the clustering accuracy of Frequent Itemset-Based Hierarchical Clustering (FIHC) method. In our approach, the key terms will be extracted from the document set, and each document is pre-processed into the designated representation for the following mining process. Then, a fuzzy association rule mining algorithm for text is employed to discover a set of highly-related fuzzy frequent itemsets, which contain key terms to be regarded as the labels of the candidate clusters. Finally, these documents will be clustered into a hierarchical cluster tree by referring to these candidate clusters. We have conducted experiments to evaluate the performance based on Classic4, Hitech, Re0, Reuters, and Wap datasets. The experimental results show that our approach not only absolutely retains the merits of FIHC, but also improves the accuracy quality of FIHC.