RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
ACM Transactions on Information Systems (TOIS)
Author identification: Using text sampling to handle the class imbalance problem
Information Processing and Management: an International Journal
Author identification using writer-dependent and writer-independent strategies
Proceedings of the 2008 ACM symposium on Applied computing
Tensor Space Models for Authorship Identification
SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Computational methods in authorship attribution
Journal of the American Society for Information Science and Technology
A survey of modern authorship attribution methods
Journal of the American Society for Information Science and Technology
Wikipedia-based semantic interpretation for natural language processing
Journal of Artificial Intelligence Research
Fast text categorization using concise semantic analysis
Pattern Recognition Letters
N-Gram feature selection for authorship identification
AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
Hi-index | 0.00 |
This paper proposes a novel representation for Authorship Attribution (AA), based on Concise Semantic Analysis (CSA), which has been successfully used in Text Categorization (TC). Our approach for AA, called Document Author Representation (DAR), builds document vectors in a space of authors, calculating the relationship between textual features and authors. In order to evaluate our approach, we compare the proposed representation with conventional approaches and previous works using the c50 corpus. We found that DAR can be very useful in AA tasks, because it provides good performance on imbalanced data, getting comparable or better accuracy results.