Text compression
Towards language independent automated learning of text categorization models
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Feature Engineering for Text Classification
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Text Mining: A New Frontier for Lossless Compression
DCC '99 Proceedings of the Conference on Data Compression
Automatic text categorization in terms of genre and author
Computational Linguistics
Automatic authorship attribution
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Applying Authorship Analysis to Extremist-Group Web Forum Messages
IEEE Intelligent Systems
Segmenting documents by stylistic character
Natural Language Engineering
Broad coverage paragraph segmentation across languages and domains
ACM Transactions on Speech and Language Processing (TSLP)
Author identification: Using text sampling to handle the class imbalance problem
Information Processing and Management: an International Journal
Foundations and Trends in Information Retrieval
Stylometric Identification in Electronic Markets: Scalability and Robustness
Journal of Management Information Systems
A survey of modern authorship attribution methods
Journal of the American Society for Information Science and Technology
Using distributional similarity to identify individual verb choice
INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
A classifier system for author recognition using synonym-based features
MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Author attribution of Turkish texts by feature mining
ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
Text-based video content classification for online video-sharing sites
Journal of the American Society for Information Science and Technology
Authorship attribution using probabilistic context-free grammars
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Automatic authorship attribution for texts in croatian language using combinations of features
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
Local histograms of character N-grams for authorship attribution
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Gender attribution: tracing stylometric evidence beyond topic and genre
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Effective and scalable authorship attribution using function words
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Using relative entropy for authorship attribution
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
N-Gram feature selection for authorship identification
AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
Applying authorship analysis to arabic web content
ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
Characterizing stylistic elements in syntactic structure
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Recognition of word collocation habits using frequency rank ratio and inter-term intimacy
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
We present a method for computer-assisted authorship attribution based on character-level n-gram language models. Our approach is based on simple information theoretic principles, and achieves improved performance across a variety of languages without requiring extensive pre-processing or feature selection. To demonstrate the effectiveness and language independence of our approach, we present experimental results on Greek, English, and Chinese data. We show that our approach achieves state of the art performance in each of these cases. In particular, we obtain a 18% accuracy improvement over the best published results for a Greek data set, while using a far simpler technique than previous investigations.