Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures

Authors:
Wei Song;Cheng Hua Li;Soon Cheol Park
Affiliations:
Department of Electronics and Information Engineering, Chonbuk National University, Jeonju, Jeonbuk 561-756, Republic of Korea;Department of Electronics and Information Engineering, Chonbuk National University, Jeonju, Jeonbuk 561-756, Republic of Korea;Department of Electronics and Information Engineering, Chonbuk National University, Jeonju, Jeonbuk 561-756, Republic of Korea
Venue:
Expert Systems with Applications: An International Journal
Year:
2009

Citing 14
Cited 13

WordNet: a lexical database for English

Communications of the ACM
A Robust Competitive Clustering Algorithm With Applications in Computer Vision

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources

IEEE Transactions on Knowledge and Data Engineering
A novel word clustering algorithm based on latent semantic analysis

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A Graph-Theoretic Approach to Nonparametric Cluster Analysis

IEEE Transactions on Computers
A Branch and Bound Clustering Algorithm

IEEE Transactions on Computers
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence
Genetic algorithm-based text clustering technique

ICNC'06 Proceedings of the Second international conference on Advances in Natural Computation - Volume Part I
Nonparametric genetic clustering: comparison of validity indices

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Evolutionary programming made faster

IEEE Transactions on Evolutionary Computation
Evolutionary programming using mutations based on the Levy probability distribution

IEEE Transactions on Evolutionary Computation
Multiobjective GAs, quantitative indices, and pattern classification

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Parametric and nonparametric evolutionary computing with a content-based feature selection approach for parallel categorization

Expert Systems with Applications: An International Journal
Document similarity: a new measure using OWA

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Time complexity estimation and optimisation of the genetic algorithm clustering method

WSEAS Transactions on Mathematics
An improved genetic algorithm for optimal feature subset selection from multi-character feature set

Expert Systems with Applications: An International Journal
Genetic regulatory network-based symbiotic evolution

Expert Systems with Applications: An International Journal
Research of fast SOM clustering for text information

Expert Systems with Applications: An International Journal
The use of a genetic algorithm for clustering the weighing station performance in transportation - A case study

Expert Systems with Applications: An International Journal
A multi-layer text classification framework based on two-level representation model

Expert Systems with Applications: An International Journal
Ontology-based semantic similarity: A new feature-based approach

Expert Systems with Applications: An International Journal
Efficient stochastic algorithms for document clustering

Information Sciences: an International Journal
Probability based document clustering and image clustering using content-based image retrieval

Applied Soft Computing
Summarising customer online reviews using a new text mining approach

International Journal of Business Information Systems
High performance genetic algorithm based text clustering using parts of speech and outlier elimination

Applied Intelligence

Quantified Score

Hi-index	12.06

Visualization

Abstract

This paper proposes a self-organized genetic algorithm for text clustering based on ontology method. The common problem in the fields of text clustering is that the document is represented as a bag of words, while the conceptual similarity is ignored. We take advantage of thesaurus-based and corpus-based ontology to overcome this problem. However, the traditional corpus-based method is rather difficult to tackle. A transformed latent semantic indexing (LSI) model which can appropriately capture the associated semantic similarity is proposed and demonstrated as corpus-based ontology in this article. To investigate how ontology methods could be used effectively in text clustering, two hybrid strategies using various similarity measures are implemented. Experiments results show that our method of genetic algorithm in conjunction with the ontology strategy, the combination of the transformed LSI-based measure with the thesaurus-based measure, apparently outperforms that with traditional similarity measures. Our clustering algorithm also efficiently enhances the performance in comparison with standard GA and k-means in the same similarity environments.