Expert Systems with Applications: An International Journal
Clustering of document collection - A weighting approach
Expert Systems with Applications: An International Journal
Performance evaluation of density-based clustering methods
Information Sciences: an International Journal
Text document clustering based on neighbors
Data & Knowledge Engineering
An incremental affinity propagation algorithm and its applications for text clustering
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Collaborative content and user-based web ontology learning system
FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
Knowledge discovery from text learning for ontology modeling
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Shape pattern matching: A tool to cluster unstructured text documents
Journal of Computational Methods in Sciences and Engineering - Special Supplement Issue in Section A and B: Selected Papers from the ISCA International Conference on Software Engineering and Data Engineering, 2009
Expert Systems with Applications: An International Journal
An IPC-based vector space model for patent retrieval
Information Processing and Management: an International Journal
A parallel ACO algorithm to select terms to categorise longer documents
International Journal of Computational Science and Engineering
An enhanced ACO algorithm to select features for text categorization and its parallelization
Expert Systems with Applications: An International Journal
Vector space model for patent documents with hierarchical class labels
Journal of Information Science
A three-phase method for patent classification
Information Processing and Management: an International Journal
Text Document Clustering with Hybrid Feature Selection
Proceedings of International Conference on Information Integration and Web-based Applications & Services
Hi-index | 0.01 |
Feature selection is an important method for improving the efficiency and accuracy of text categorization algorithms by removing redundant and irrelevant terms from the corpus. In this paper, we propose a new supervised feature selection method, named CHIR, which is based on the Chi-square statistic and new statistical data that can measure the positive term-category dependency. We also propose a new text clustering algorithm TCFS, which stands for Text Clustering with Feature Selection. TCFS can incorporate CHIR to identify relevant features (i.e., terms) iteratively, and the clustering becomes a learning process. We compared TCFS and the k-means clustering algorithm in combination with different feature selection methods for various real data sets. Our experimental results show that TCFS with CHIR has better clustering accuracy in terms of the F-measure and the purity.