Cluster based symbolic representation and feature selection for text classification

Authors:
B. S. Harish;D. S. Guru;S. Manjunath;R. Dinesh
Affiliations:
Department of Studies in Computer Science, University of Mysore, Mysore, India;Department of Studies in Computer Science, University of Mysore, Mysore, India;Department of Studies in Computer Science, University of Mysore, Mysore, India;Honeywell Technologies Ltd, Bangalore, India
Venue:
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Year:
2010

Citing 11
Cited 0

Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Using Unlabelled Data for Text Classification through Addition of Cluster Parameters

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Enhanced word clustering for hierarchical text classification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
CBC: Clustering Based Text Classification Requiring Minimal Labeled Data

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Multivalued type proximity measure and concept of mutual similarity value useful for clustering symbolic patterns

Pattern Recognition Letters
Two-dimensional clustering for text categorization

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Regularized locality preserving indexing via spectral regression

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine

IEEE Transactions on Knowledge and Data Engineering
Symbolic representation of text documents

Proceedings of the Third Annual ACM Bangalore Conference
Comparing dimension reduction techniques for document clustering

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a new method of representing documents based on clustering of term frequency vectors. For each class of documents we propose to create multiple clusters to preserve the intraclass variations. Term frequency vectors of each cluster are used to form a symbolic representation by the use of interval valued features. Subsequently we propose a novel symbolic method for feature selection. The corresponding symbolic text classification is also presented. To corroborate the efficacy of the proposed model we conducted an experimentation on various datasets. Experimental results reveal that the proposed method gives better results when compared to the state of the art techniques. In addition, as the method is based on a simple matching scheme, it requires a negligible time.