A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
A vector space model for automatic indexing
Communications of the ACM
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Journal of the American Society for Information Science and Technology
Fuzzy support vector machine for multi-class text categorization
Information Processing and Management: an International Journal
Using Wikipedia knowledge to improve text classification
Knowledge and Information Systems
Bayesian network models for hierarchical text classification from a thesaurus
International Journal of Approximate Reasoning
Feature generation for text categorization using world knowledge
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Research of Chinese Text Classification Methods Based on Semantic Vector and Semantic Similarity
IFCSTA '09 Proceedings of the 2009 International Forum on Computer Science-Technology and Applications - Volume 02
Boosting for text classification with semantic features
WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
Hi-index | 0.00 |
With the growing amount of textual information available on the Internet, the importance of automatic text classification has been increasing in the last decade. In this paper, a system was presented for the classification of multi-class Farsi documents which uses Support Vector Machine (SVM) classifier. The new idea proposed in the present paper, is based on extending the feature vector by adding some words extracted from a thesaurus. The goal is to assist classifier when training dataset is not comprehensive for some categories. For corpus preparation, Farsi Wikipedia website and articles of some archived newspapers and magazines are used. As the results indicate, classification efficiency improves by applying this approach. 0.89 micro F-measure were achieved for classification of 10 categories of Farsi texts.