Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Authorship Attribution with Support Vector Machines
Applied Intelligence
Applying Authorship Analysis to Extremist-Group Web Forum Messages
IEEE Intelligent Systems
Author verification by linguistic profiling: An exploration of the parameter space
ACM Transactions on Speech and Language Processing (TSLP)
Measuring Differentiability: Unmasking Pseudonymous Authors
The Journal of Machine Learning Research
Foundations and Trends in Information Retrieval
A survey of modern authorship attribution methods
Journal of the American Society for Information Science and Technology
Contextual phrase-level polarity analysis using lexical affect scoring and syntactic N-grams
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Authorship attribution in the wild
Language Resources and Evaluation
Local histograms of character N-grams for authorship attribution
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Syntactic dependency-based n-grams: more evidence of usefulness in classification
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Syntactic dependency-based n-grams as classification features
MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part II
Hi-index | 12.05 |
In this paper we introduce and discuss a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional n-grams in the manner how we construct them, i.e., what elements are considered neighbors. In case of sn-grams, the neighbors are taken by following syntactic relations in syntactic trees, and not by taking words as they appear in a text, i.e., sn-grams are constructed by following paths in syntactic trees. In this manner, sn-grams allow bringing syntactic knowledge into machine learning methods; still, previous parsing is necessary for their construction. Sn-grams can be applied in any natural language processing (NLP) task where traditional n-grams are used. We describe how sn-grams were applied to authorship attribution. We used as baseline traditional n-grams of words, part of speech (POS) tags and characters; three classifiers were applied: support vector machines (SVM), naive Bayes (NB), and tree classifier J48. Sn-grams give better results with SVM classifier.