A weighted profile intersection measure for profile-based authorship attribution

Authors:
Hugo Jair Escalante;Manuel Montes-y-Gómez;Thamar Solorio
Affiliations:
Universidad Autónoma de Nuevo León, San Nicolas de los Garza, N. L., Mexico;Instituto Nacional de Astrofísica, Óptica y Electrónica, Tonantzintla, Puebla, Mexico;University of Alabama at Birmingham, Birmingham, AL
Venue:
MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Year:
2011

Citing 14
Cited 3

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Authorship attribution

Foundations and Trends in Information Retrieval
Computational methods in authorship attribution

Journal of the American Society for Information Science and Technology
A survey of modern authorship attribution methods

Journal of the American Society for Information Science and Technology
Author Identification Using a Tensor Space Representation

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Authorship attribution and verification with many authors and limited data

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Forensic Authorship Attribution Using Compression Distances to Prototypes

IWCF '09 Proceedings of the 3rd International Workshop on Computational Forensics
Authorship attribution using probabilistic context-free grammars

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Authorship Attribution for Twitter in 140 Characters or Less

CTC '10 Proceedings of the 2010 Second Cybercrime and Trustworthy Computing Workshop
An evaluation framework for plagiarism detection

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Local histograms of character N-grams for authorship attribution

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Authorship attribution using word sequences

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
N-Gram feature selection for authorship identification

AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications

The use of orthogonal similarity relations in the prediction of authorship

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Measuring feature distributions in sentiment classification

MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part II
REPLOT: REtrieving profile links on Twitter for suspicious networks detection

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a new similarity measure called weighted profile intersection (WPI) for profile-based authorship attribution (PBAA). Authorship attribution (AA) is the task of determining which, from a set of candidate authors, wrote a given document. Under PBAA an author's profile is created by combining information extracted from sample documents written by the author of interest. An unseen document is associated with the author whose profile is most similar to the document. Although competitive performance has been obtained with PBAA, the method is limited in that the most used similarity measure only accounts for the number of overlapping terms among test documents and authors' profiles. We propose a new measure for PBAA, WPI, which takes into account an inter-author term penalization factor, besides the number of overlapping terms. Intuitively, in WPI we rely more on those terms that are (frequently) used by the author of interest and not (frequently) used by other authors when computing the similarity of the author's profile and a test document. We evaluate the proposed method in several AA data sets, including many data subsets from Twitter. Experimental results show that the proposed technique outperforms the standard PBAA method in all of the considered data sets; although the baseline method resulted very effective. Further, the proposed method achieves performance comparable to classifier-based AA methods (e.g., methods based on SVMs), which often obtain better classification results at the expense of limited interpretability and a higher computational cost.