Parametric and nonparametric evolutionary computing with a content-based feature selection approach for parallel categorization

Authors:
Wei Song;Shi Tong Wang;Cheng Hua Li
Affiliations:
School of Information Engineering, Jiangnan University, Lihu Road, Wuxi, Jiangsu Province 214122, China;School of Information Engineering, Jiangnan University, Lihu Road, Wuxi, Jiangsu Province 214122, China;School of Information Engineering, Jiangnan University, Lihu Road, Wuxi, Jiangsu Province 214122, China
Venue:
Expert Systems with Applications: An International Journal
Year:
2009

Citing 25
Cited 0

Implementing agglomerative hierarchic clustering algorithms for use in document retrieval

Information Processing and Management: an International Journal
Parallel programs for the transputer

Parallel programs for the transputer
Adaptation in natural and artificial systems

Adaptation in natural and artificial systems
The retrieval effectiveness of five clustering algorithms as a function of indexing exhaustivity

Journal of the American Society for Information Science
Hierarchic document classification using Ward's clustering method

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A Robust Competitive Clustering Algorithm With Applications in Computer Vision

IEEE Transactions on Pattern Analysis and Machine Intelligence
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Experimentation as a way of life: Okapi at TREC

Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Using LSI for text classification in the presence of background text

Proceedings of the tenth international conference on Information and knowledge management
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
An Adaptive Flocking Algorithm for Spatial Clustering

PPSN VII Proceedings of the 7th International Conference on Parallel Problem Solving from Nature
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Feature Reduction for Neural Network Based Text Categorization

DASFAA '99 Proceedings of the Sixth International Conference on Database Systems for Advanced Applications
Information-theoretical methods in clustering

Information-theoretical methods in clustering
An Intelligent Information System for Organizing Online Text Documents

Knowledge and Information Systems
Hierarchical document categorization with k-NN and concept-based thesauri

Information Processing and Management: an International Journal
A flocking based algorithm for document clustering analysis

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Nature-inspired applications and systems
A Graph-Theoretic Approach to Nonparametric Cluster Analysis

IEEE Transactions on Computers
On Multimodality of the SSTRESS Criterion for Metric Multidimensional Scaling

Informatica
Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures

Expert Systems with Applications: An International Journal
K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	12.05

Visualization

Abstract

This paper proposes two modified evolutionary computing methods for genetic algorithms (GAs) and proves an effective content-based feature selection approach to improve clustering performance. The conventional GAs suffer from the problem of slow learning and are prone to be trapped into a local minimum due to a high dimensional exploration space. In this paper, we propose a parametric and a nonparametric evolutionary algorithms to properly adjust the operators of GA. In the parametric approach, several fuzzy control parameters are artificially defined to adaptively optimize the GA behaviors. By contrast, they are automatically adjusted by GA itself in the nonparametric approach. Moreover, a content-based feature selection (CFS) approach is demonstrated to create a robust semantic space and reduce the number of dimension which accelerates the speed of evolutionary computing. We take advantage of a parallel computing technology to improve the efficiency of clustering. The experimental results show that our methods enhance the performance of the standard GA and are more efficient than those implemented on a single processor. The CFS approach not only reduces the document dimension, but also indirectly advances clustering efficiency.