Handbook of formal languages, vol. 3
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Summarizing scientific articles: experiments with relevance and rhetorical status
Computational Linguistics - Summarization
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Automatic text categorization in terms of genre and author
Computational Linguistics
The myth of the double-blind review?: author identification using only citations
ACM SIGKDD Explorations Newsletter
You're not from 'round here, are you?: naive Bayes detection of non-native utterance text
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Determining an author's native language by mining a text for errors
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Thumbs up?: sentiment classification using machine learning techniques
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Detecting errors in English article usage by non-native speakers
Natural Language Engineering
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Bootstrapping path-based pronoun resolution
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Linguistic correlates of style: authorship classification with deep linguistic analysis features
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Scientific paper summarization using citation summary networks
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
The ups and downs of preposition error detection in ESL writing
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Studying the history of ideas using topic models
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
CACLA '07 Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition
Detection of grammatical errors involving prepositions
SigSem '07 Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions
Automatically acquiring models of preposition use
SigSem '07 Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions
Bayesian learning of a tree substitution grammar
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
The ACL Anthology Network corpus
NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
Using mostly native data to correct errors in learners' writing: a meta-classifier approach
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Authorship attribution using probabilistic context-free grammars
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Helping our own: text massaging for computational linguistics as a new shared task
INLG '10 Proceedings of the 6th International Natural Language Generation Conference
Inducing Tree-Substitution Grammars
The Journal of Machine Learning Research
Finding deceptive opinion spam by any stretch of the imagination
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Discovering sociolinguistic associations with structured sparsity
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Judging grammaticality with tree substitution grammar derivations
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Gender attribution: tracing stylometric evidence beyond topic and genre
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
A study of academic collaboration in computational linguistics with latent mixtures of authors
LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Predicting a scientific community's response to an article
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Exploiting parse structures for native language identification
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
We present an approach to automatically recover hidden attributes of scientific articles, such as whether the author is a native English speaker, whether the author is a male or a female, and whether the paper was published in a conference or workshop proceedings. We train classifiers to predict these attributes in computational linguistics papers. The classifiers perform well in this challenging domain, identifying non-native writing with 95% accuracy (over a baseline of 67%). We show the benefits of using syntactic features in stylometry; syntax leads to significant improvements over bag-of-words models on all three tasks, achieving 10% to 25% relative error reduction. We give a detailed analysis of which words and syntax most predict a particular attribute, and we show a strong correlation between our predictions and a paper's number of citations.