Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
WordNet: a lexical database for English
Communications of the ACM
Learning to extract symbolic knowledge from the World Wide Web
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Pruning and summarizing the discovered associations
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent term-based text clustering
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining the peanut gallery: opinion extraction and semantic classification of product reviews
WWW '03 Proceedings of the 12th international conference on World Wide Web
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
Fuzzy data mining for interesting generalized association rules
Fuzzy Sets and Systems - Theme: Learning and modeling
Scalable Construction of Topic Directory with Nonparametric Closed Termset Mining
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Document Clustering with Semantic Analysis
HICSS '06 Proceedings of the 39th Annual Hawaii International Conference on System Sciences - Volume 03
Topic discovery based on text mining techniques
Information Processing and Management: an International Journal
Incremental clustering of dynamic data streams using connectivity based representative points
Data & Knowledge Engineering
An active learning framework for semi-supervised document clustering with language modeling
Data & Knowledge Engineering
Frequent items in streaming data: An experimental evaluation of the state-of-the-art
Data & Knowledge Engineering
Mining non-derivable frequent itemsets over data stream
Data & Knowledge Engineering
An Integration of Fuzzy Association Rules and WordNet for Document Clustering
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
WordNet-based text document clustering
ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
SyMSS: A syntax-based measure for short-text semantic similarity
Data & Knowledge Engineering
Extracting hot spots of topics from time-stamped documents
Data & Knowledge Engineering
SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices
Data & Knowledge Engineering
An architecture for component-based design of representative-based clustering algorithms
Data & Knowledge Engineering
Measuring the coverage and redundancy of information search services on e-commerce platforms
Electronic Commerce Research and Applications
Extraction of fuzzy rules from fuzzy decision trees: An axiomatic fuzzy sets (AFS) approach
Data & Knowledge Engineering
Summarising customer online reviews using a new text mining approach
International Journal of Business Information Systems
Hi-index | 0.00 |
With the rapid growth of text documents, document clustering has become one of the main techniques for organizing large amount of documents into a small number of meaningful clusters. However, there still exist several challenges for document clustering, such as high dimensionality, scalability, accuracy, meaningful cluster labels, overlapping clusters, and extracting semantics from texts. In order to improve the quality of document clustering results, we propose an effective Fuzzy-based Multi-label Document Clustering (FMDC) approach that integrates fuzzy association rule mining with an existing ontology WordNet to alleviate these problems. In our approach, the key terms will be extracted from the document set, and the initial representation of all documents is further enriched by using hypernyms of WordNet in order to exploit the semantic relations between terms. Then, a fuzzy association rule mining algorithm for texts is employed to discover a set of highly-related fuzzy frequent itemsets, which contain key terms to be regarded as the labels of the candidate clusters. Finally, each document is dispatched into more than one target cluster by referring to these candidate clusters, and then the highly similar target clusters are merged. We conducted experiments to evaluate the performance based on Classic, Re0, R8, and WebKB datasets. The experimental results proved that our approach outperforms the influential document clustering methods with higher accuracy. Therefore, our approach not only provides more general and meaningful labels for documents, but also effectively generates overlapping clusters.