Once. A test of authorship based on words which are not repeated in the sample
Literary & Linguistic Computing
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
ACM SIGIR Forum
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Machine Learning
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Character N-Gram Tokenization for European Language Text Retrieval
Information Retrieval
Automatic text categorization in terms of genre and author
Computational Linguistics
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
Journal of the American Society for Information Science and Technology
An Introduction to Language Processing with Perl and Prolog: An Outline of Theories, Implementation, and Application with Special Consideration of English, French, and German (Cognitive Technologies)
Journal of the American Society for Information Science and Technology
Journal of the American Society for Information Science and Technology
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Comparative Evaluation of Multilingual Information Access Systems: 4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003, Trondheim, Norway, August ... Papers (Lecture Notes in Computer Science)
Searching with style: authorship attribution in classic literature
ACSC '07 Proceedings of the thirtieth Australasian conference on Computer science - Volume 62
Foundations and Trends in Information Retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
Practical Text Mining with Perl
Practical Text Mining with Perl
Automatically profiling the author of an anonymous text
Communications of the ACM - Inspiring Women in Computing
Computational methods in authorship attribution
Journal of the American Society for Information Science and Technology
A survey of modern authorship attribution methods
Journal of the American Society for Information Science and Technology
The R Book
Algorithmic stemmers or morphological analysis? An evaluation
Journal of the American Society for Information Science and Technology
When stopword lists make the difference
Journal of the American Society for Information Science and Technology
Entropy-based authorship search in large document collections
ECIR'07 Proceedings of the 29th European conference on IR research
Fundamentals of Predictive Text Mining
Fundamentals of Predictive Text Mining
Effective and scalable authorship attribution using function words
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Which is the best multiclass SVM method? an empirical study
MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
Feature selections for authorship attribution
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Hi-index | 0.00 |
In this article we propose a technique for computing a standardized Z score capable of defining the specific vocabulary found in a text (or part thereof) compared to that of an entire corpus. Assuming that the term occurrence follows a binomial distribution, this method is then applied to weight terms (words and punctuation symbols in the current study), representing the lexical specificity of the underlying text. In a final stage, to define an author profile we suggest averaging these text representations and then applying them along with a distance measure to derive a simple and efficient authorship attribution scheme. To evaluate this algorithm and demonstrate its effectiveness, we develop two experiments, the first based on 5,408 newspaper articles (Glasgow Herald) written in English by 20 distinct authors and the second on 4,326 newspaper articles (La Stampa) written in Italian by 20 distinct authors. These experiments demonstrate that the suggested classification scheme tends to perform better than the Delta rule method based on the most frequent words, better than the chi-square distance based on word profiles and punctuation marks, better than the KLD scheme based on a predefined set of words, and better than the naïve Bayes approach.