Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text categorization using weight adjusted k-nearest neighbor classification (information retrieval)
Text categorization using weight adjusted k-nearest neighbor classification (information retrieval)
Supervised term weighting for automated text categorization
Proceedings of the 2003 ACM symposium on Applied computing
Raising the baseline for high-precision text classifiers
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Ontology-based context synchronization for ad hoc social collaborations
Knowledge-Based Systems
CLBCRA-Approach for Combination of Content-Based and Link-Based Ranking in Web Search
ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Classification techniques with minimal labelling effort and application to medical reports
International Journal of Data Mining and Bioinformatics
Improving Automatic Text Classification by Integrated Feature Analysis
IEICE - Transactions on Information and Systems
Information Processing and Management: an International Journal
Topic model methods for automatically identifying out-of-scope resources
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Accessing Positive and Negative Online Opinions
UAHCI '09 Proceedings of the 5th International Conference on Universal Access in Human-Computer Interaction. Part III: Applications and Services
A weighting approach for features based on real rough set
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 6
A schema for ontology-based concept definition and identification
International Journal of Computer Applications in Technology
The ECIR 2010 large scale hierarchical classification workshop
ACM SIGIR Forum
A vector space model for subjectivity classification in Urdu aided by co-training
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Optimizing personalized retrieval system based on web ranking
CSR'06 Proceedings of the First international computer science conference on Theory and Applications
Nonlinear transformation of term frequencies for term weighting in text categorization
Engineering Applications of Artificial Intelligence
Ontology-Based genes similarity calculation with TF-IDF
ICICA'12 Proceedings of the Third international conference on Information Computing and Applications
Automatic classification of documents in cold-start scenarios
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
Comparison of text feature selection policies and using an adaptive framework
Expert Systems with Applications: An International Journal
Sentiment analysis on evolving social streams: how self-report imbalances can help
Proceedings of the 7th ACM international conference on Web search and data mining
A study of supervised term weighting scheme for sentiment analysis
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
KNN and SVM are two machine learning approaches to Text Categorization (TC) based on the Vector Space Model. In this model, borrowed from Information Retrieval, documents are represented as a vector where each component is associated with a particular word from the vocabulary. Traditionally, each component value is assigned using the information retrieval TFIDF measure. While this weighting method seems very appropriate for IR, it is not clear that it is the best choice for TC problems. Actually, this weighting method does not leverage the information implicitly contained in the categorization task to represent documents. In this paper, we introduce a new weighting method based on statistical estimation of the importance of a word for a specific categorization problem. This method also has the benefit to make feature selection implicit, since useless features for the categorization problem considered get a very small weight. Extensive experiments reported in the paper shows that this new weighting method improves significantly the classification accuracy as measured on many categorization tasks.