The Random Subspace Method for Constructing Decision Forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Mining e-mail content for author identification forensics
ACM SIGMOD Record
A repetition based measure for verification of text collections and for text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Applying Authorship Analysis to Extremist-Group Web Forum Messages
IEEE Intelligent Systems
Journal of the American Society for Information Science and Technology
From fingerprint to writeprint
Communications of the ACM - Supporting exploratory search
IEEE Transactions on Pattern Analysis and Machine Intelligence
Author Identification Using Imbalanced and Limited Training Texts
DEXA '07 Proceedings of the 18th International Conference on Database and Expert Systems Applications
ACM Transactions on Information Systems (TOIS)
Pattern Recognition, Fourth Edition
Pattern Recognition, Fourth Edition
Co-training with relevant random subspaces
Neurocomputing
Random sampling LDA for face recognition
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
N-Gram feature selection for authorship identification
AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
Ensembling local learners ThroughMultimodal perturbation
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A novel approach of mining write-prints for authorship attribution in e-mail forensics
Digital Investigation: The International Journal of Digital Forensics & Incident Response
Mining writeprints from anonymous e-mails for forensic investigation
Digital Investigation: The International Journal of Digital Forensics & Incident Response
On the Feasibility of Internet-Scale Author Identification
SP '12 Proceedings of the 2012 IEEE Symposium on Security and Privacy
A unified data mining solution for authorship analysis in anonymous textual communications
Information Sciences: an International Journal
Hi-index | 0.01 |
The anonymous nature of online messages distribution causes a series of moral and legal issues. By analyzing identity cues people leave behind their texts, i.e., writeprint, potential authors can be identified individually. But writeprint identification is a difficult learning task, because of the high redundancy in stylistic feature set and high similarity of some authors' writing-style. In this paper, we propose a novel method, called semi-random subspace (Semi-RS), to simultaneously address the two problems. Different from the conventional random subspace method (RSM) which samples features from the whole feature set in a completely random way, the proposed Semi-RS randomly samples features on each individual-author feature set (IAFS) partitioned from the whole feature set. More specifically, we first divide the whole feature set into several IAFSs in a deterministic way, then construct a set of base classifiers on different randomly sampled feature sets from each IAFS, and finally combine all base classifiers for the final decision. Experimental results on the benchmark dataset demonstrate the effectiveness of the proposed method which improves previously reported results. In addition, we analyze the diversity of algorithm, reveals that Semi-RS constructs more diverse base classifiers than conventional RSMs.