Feature selections for authorship attribution

Authors:
Jacques Savoy
Affiliations:
University of Neuchatel, Neuchâtel, Switzerland
Venue:
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Year:
2013

Citing 9
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Feature subset selection bias for classification learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Entropy-based authorship search in large document collections

ECIR'07 Proceedings of the 29th European conference on IR research
Authorship Attribution Based on Specific Vocabulary

ACM Transactions on Information Systems (TOIS)
Authorship attribution based on a probabilistic topic model

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The authorship attribution (AA) problem can be viewed as a categorization problem. To determine the most effective features to discriminate between different authors, we have evaluated six independent feature-scoring selection functions (information gain, pointwise mutual information, odds ratio, χ2, DIA, and the document frequency (df)). To compare these approaches, we have selected articles related to sports in a newspaper corpus (La Stampa). Using the KL divergence [1] as attribution scheme, we found that the df selection strategy tends to produce high performance levels similar to more complex ones.