Foundations of statistical natural language processing
Foundations of statistical natural language processing
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Machine Learning
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Journal of the American Society for Information Science and Technology
Language and the Internet
Introduction to Information Retrieval
Introduction to Information Retrieval
Opinion Mining and Sentiment Analysis
Foundations and Trends in Information Retrieval
Automatically profiling the author of an anonymous text
Communications of the ACM - Inspiring Women in Computing
Hi-index | 0.00 |
Assuming a binomial distribution for word occurrence, we propose computing a standardized Z score to define the specific vocabulary of a subset compared to that of the entire corpus. This approach is applied to weight terms characterizing a document (or a sample of texts). We then show how these Z score values can be used to derive an efficient categorization scheme. To evaluate this proposition we categorize speeches given by B. Obama as either electoral or presidential. The results tend to show that the suggested classification scheme performs better than a Support Vector Machine scheme, and a Naive Bayes classifier (10-fold cross validation).