Semi-random subspace method for writeprint identification

Authors:
Zhi Liu;Zongkai Yang;Sanya Liu;Yinghui Shi
Affiliations:
National Engineering Research Center for E-Learning, 152 Luoyu Road, Wuhan, Hubei 430079, PR China;National Engineering Research Center for E-Learning, 152 Luoyu Road, Wuhan, Hubei 430079, PR China;National Engineering Research Center for E-Learning, 152 Luoyu Road, Wuhan, Hubei 430079, PR China;National Engineering Research Center for E-Learning, 152 Luoyu Road, Wuhan, Hubei 430079, PR China
Venue:
Neurocomputing
Year:
2013

Citing 19
Cited 0

The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Mining e-mail content for author identification forensics

ACM SIGMOD Record
A repetition based measure for verification of text collections and for text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Applying Authorship Analysis to Extremist-Group Web Forum Messages

IEEE Intelligent Systems
A framework for authorship identification of online messages: Writing-style features and classification techniques

Journal of the American Society for Information Science and Technology
From fingerprint to writeprint

Communications of the ACM - Supporting exploratory search
Asymmetric Bagging and Random Subspace for Support Vector Machines-Based Relevance Feedback in Image Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Author Identification Using Imbalanced and Limited Training Texts

DEXA '07 Proceedings of the 18th International Conference on Database and Expert Systems Applications
Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace

ACM Transactions on Information Systems (TOIS)
Pattern Recognition, Fourth Edition

Pattern Recognition, Fourth Edition
Co-training with relevant random subspaces

Neurocomputing
Random sampling LDA for face recognition

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
N-Gram feature selection for authorship identification

AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
Ensembling local learners ThroughMultimodal perturbation

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A novel approach of mining write-prints for authorship attribution in e-mail forensics

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Mining writeprints from anonymous e-mails for forensic investigation

Digital Investigation: The International Journal of Digital Forensics & Incident Response
On the Feasibility of Internet-Scale Author Identification

SP '12 Proceedings of the 2012 IEEE Symposium on Security and Privacy
A unified data mining solution for authorship analysis in anonymous textual communications

Information Sciences: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

The anonymous nature of online messages distribution causes a series of moral and legal issues. By analyzing identity cues people leave behind their texts, i.e., writeprint, potential authors can be identified individually. But writeprint identification is a difficult learning task, because of the high redundancy in stylistic feature set and high similarity of some authors' writing-style. In this paper, we propose a novel method, called semi-random subspace (Semi-RS), to simultaneously address the two problems. Different from the conventional random subspace method (RSM) which samples features from the whole feature set in a completely random way, the proposed Semi-RS randomly samples features on each individual-author feature set (IAFS) partitioned from the whole feature set. More specifically, we first divide the whole feature set into several IAFSs in a deterministic way, then construct a set of base classifiers on different randomly sampled feature sets from each IAFS, and finally combine all base classifiers for the final decision. Experimental results on the benchmark dataset demonstrate the effectiveness of the proposed method which improves previously reported results. In addition, we analyze the diversity of algorithm, reveals that Semi-RS constructs more diverse base classifiers than conventional RSMs.