Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
What size net gives valid generalization?
Neural Computation
An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Towards language independent automated learning of text categorization models
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Feature Engineering for Text Classification
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Automatic Thai-Language Essay Scoring Using Neural Network and Latent Semantic Analysis
AMS '07 Proceedings of the First Asia International Conference on Modelling & Simulation
Neural Network for Text Classification Based on Singular Value Decomposition
CIT '07 Proceedings of the 7th IEEE International Conference on Computer and Information Technology
Knowledge Based Neural Network for Text Classification
GRC '07 Proceedings of the 2007 IEEE International Conference on Granular Computing
Neighbor-weighted K-nearest neighbor for unbalanced text corpus
Expert Systems with Applications: An International Journal
Text categorization based on artificial neural networks
ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Exploring the risk factors of preterm birth using data mining
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
Automated crime report analysis and classification for e-government and decision support
Proceedings of the 14th Annual International Conference on Digital Government Research
Expert Systems with Applications: An International Journal
Hi-index | 12.06 |
Automatic text classification based on vector space model (VSM), artificial neural networks (ANN), K-nearest neighbor (KNN), Naives Bayes (NB) and support vector machine (SVM) have been applied on English language documents, and gained popularity among text mining and information retrieval (IR) researchers. This paper proposes the application of VSM and ANN for the classification of Tamil language documents. Tamil is morphologically rich Dravidian classical language. The development of internet led to an exponential increase in the amount of electronic documents not only in English but also other regional languages. The automatic classification of Tamil documents has not been explored in detail so far. In this paper, corpus is used to construct and test the VSM and ANN models. Methods of document representation, assigning weights that reflect the importance of each term are discussed. In a traditional word-matching based categorization system, the most popular document representation is VSM. This method needs a high dimensional space to represent the documents. The ANN classifier requires smaller number of features. The experimental results show that ANN model achieves 93.33% which is better than the performance of VSM which yields 90.33% on Tamil document classification.