Foundations of statistical natural language processing
Foundations of statistical natural language processing
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Word association norms, mutual information, and lexicography
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Feature subset selection bias for classification learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Entropy-based authorship search in large document collections
ECIR'07 Proceedings of the 29th European conference on IR research
Authorship Attribution Based on Specific Vocabulary
ACM Transactions on Information Systems (TOIS)
Authorship attribution based on a probabilistic topic model
Information Processing and Management: an International Journal
Hi-index | 0.00 |
The authorship attribution (AA) problem can be viewed as a categorization problem. To determine the most effective features to discriminate between different authors, we have evaluated six independent feature-scoring selection functions (information gain, pointwise mutual information, odds ratio, χ2, DIA, and the document frequency (df)). To compare these approaches, we have selected articles related to sports in a newspaper corpus (La Stampa). Using the KL divergence [1] as attribution scheme, we found that the df selection strategy tends to produce high performance levels similar to more complex ones.