Syntactic dependency-based n-grams as classification features

Authors:
Grigori Sidorov;Francisco Velasquez;Efstathios Stamatatos;Alexander Gelbukh;Liliana Chanona-Hernández
Affiliations:
Center for Computing Research (CIC), Instituto Politécnico Nacional (IPN), Mexico City, Mexico;Center for Computing Research (CIC), Instituto Politécnico Nacional (IPN), Mexico City, Mexico;University of the Aegean, Greece;Center for Computing Research (CIC), Instituto Politécnico Nacional (IPN), Mexico City, Mexico;ESIME, Instituto Politécnico Nacional (IPN), Mexico City, Mexico
Venue:
MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part II
Year:
2012

Citing 11
Cited 3

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Authorship Attribution with Support Vector Machines

Applied Intelligence
Applying Authorship Analysis to Extremist-Group Web Forum Messages

IEEE Intelligent Systems
Author verification by linguistic profiling: An exploration of the parameter space

ACM Transactions on Speech and Language Processing (TSLP)
Measuring Differentiability: Unmasking Pseudonymous Authors

The Journal of Machine Learning Research
Authorship attribution

Foundations and Trends in Information Retrieval
A survey of modern authorship attribution methods

Journal of the American Society for Information Science and Technology
Contextual phrase-level polarity analysis using lexical affect scoring and syntactic N-grams

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Authorship attribution in the wild

Language Resources and Evaluation
Local histograms of character N-grams for authorship attribution

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Syntactic dependency-based n-grams: more evidence of usefulness in classification

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Large vocabulary Russian speech recognition using syntactico-statistical language modeling

Speech Communication
Syntactic N-grams as machine learning features for natural language processing

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we introduce a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional n-grams in the manner of what elements are considered neighbors. In case of sn-grams, the neighbors are taken by following syntactic relations in syntactic trees, and not by taking the words as they appear in the text. Dependency trees fit directly into this idea, while in case of constituency trees some simple additional steps should be made. Sn-grams can be applied in any NLP task where traditional n-grams are used. We describe how sn-grams were applied to authorship attribution. SVM classifier for several profile sizes was used. We used as baseline traditional n-grams of words, POS tags and characters. Obtained results are better when applying sn-grams.