Text clustering based on granular computing and wikipedia

Authors:
Liping Jing;Jian Yu
Affiliations:
School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China;School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
Venue:
RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
Year:
2011

Citing 15
Cited 0

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Knowledge-Based Systems in Artificial Intelligence: 2 Case Studies

Knowledge-Based Systems in Artificial Intelligence: 2 Case Studies
Latent dirichlet allocation

The Journal of Machine Learning Research
Clustering short texts using wikipedia

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Enhancing text clustering by leveraging Wikipedia semantics

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Building semantic kernels for text classification using wikipedia

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Modular network SOM

Neural Networks
What's in Wikipedia?: mapping topics and conflict using socially annotated category structure

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Clustering Documents Using a Wikipedia-Based Concept Representation

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
SOM of SOMs

Neural Networks
Exploiting Wikipedia as external knowledge for document clustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Granular Computing for Text Mining: New Research Challenges and Opportunities

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Knowledge-based vector space model for text clustering

Knowledge and Information Systems
Semantics-based representation model for multi-layer text classification

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text clustering plays an important role in many real-world applications, but it is faced with various challenges, such as, curse of dimensionality, complex semantics and large volume. A lot of researches paid attention to deal with such problems by designing new text representation models and clustering algorithms. However, text clustering still remains a research problem due to the complicated properties of text data. In this paper, a text clustering procedure is proposed based on the principle of granular computing with the aid of Wikipedia. The proposed clustering method firstly identifies the text granules, especially focusing on concepts and words with the aid of Wikipedia. And then, it mines the latent patterns based on the computation of such granules. Experimental results on benchmark data sets (20Newsgroups and Reuters-21578) have shown that the proposed method improves the performance of text clustering by comparing with the existing clustering algorithm together with the existing representation models.