Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Using English for indexing and retrieving
Artificial intelligence at MIT expanding frontiers
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
A vector space model for automatic indexing
Communications of the ACM
Using Literal and Grammatical Statistics for Authorship Attribution
Problems of Information Transmission
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Exploiting lexical regularities in designing natural language systems
COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Identifying expression fingerprints using linguistic information
Identifying expression fingerprints using linguistic information
Using syntactic information to identify plagiarism
EdAppsNLP 05 Proceedings of the second workshop on Building Educational Applications Using NLP
Capturing expression using linguistic information
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
A survey of modern authorship attribution methods
Journal of the American Society for Information Science and Technology
Forensic Authorship Attribution Using Compression Distances to Prototypes
IWCF '09 Proceedings of the 3rd International Workshop on Computational Forensics
Automatic authorship attribution for texts in croatian language using combinations of features
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
Explicit length modelling for statistical machine translation
IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Explicit length modelling for statistical machine translation
Pattern Recognition
Use fewer instances of the letter "i": toward writing style anonymization
PETS'12 Proceedings of the 12th international conference on Privacy Enhancing Technologies
Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity
ACM Transactions on Information and System Security (TISSEC)
Hi-index | 0.00 |
Linguistic information can help improve evaluation of similarity between documents; however, the kind of linguistic information to be used depends on the task. In this paper, we show that distributions of syntactic structures capture the way works are written and accurately identify individual books more than 76% of the time. In comparison, baseline features, e.g., tfidf-weighted keywords, function words, etc., give an accuracy of at most 66%. However, testing the same features on authorship attribution shows that distributions of syntactic structures are less successful than function words on this task; syntactic structures vary even among the works of the same author whereas features such as function words are distributed more similarly among the works of an author and can more effectively capture authorship.