The Earth Mover's Distance as a Metric for Image Retrieval
International Journal of Computer Vision
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Augmenting Naive Bayes Classifiers with Statistical Language Models
Information Retrieval
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
In Defense of One-Vs-All Classification
The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Diffusion Kernels on Statistical Manifolds
The Journal of Machine Learning Research
Language independent authorship attribution using character level language models
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Matching sets of features for efficient retrieval and recognition
Matching sets of features for efficient retrieval and recognition
Sequential Document Visualization
IEEE Transactions on Visualization and Computer Graphics
The Locally Weighted Bag of Words Framework for Document Representation
The Journal of Machine Learning Research
Tensor Space Models for Authorship Identification
SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Computational methods in authorship attribution
Journal of the American Society for Information Science and Technology
A survey of modern authorship attribution methods
Journal of the American Society for Information Science and Technology
Author Identification Using a Tensor Space Representation
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Forensic Authorship Attribution Using Compression Distances to Prototypes
IWCF '09 Proceedings of the 3rd International Workshop on Computational Forensics
Movie segmentation into scenes and chapters using locally weighted bag of visual words
Proceedings of the ACM International Conference on Image and Video Retrieval
Authorship attribution using word sequences
CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
Effective and scalable authorship attribution using function words
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
N-Gram feature selection for authorship identification
AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
A weighted profile intersection measure for profile-based authorship attribution
MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Representation models for text classification: a comparative analysis over three web document types
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Content vs. context for sentiment analysis: a comparative analysis over microblogs
Proceedings of the 23rd ACM conference on Hypertext and social media
Modeling coherence in ESOL learner texts
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
The use of orthogonal similarity relations in the prediction of authorship
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Syntactic dependency-based n-grams as classification features
MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part II
Syntactic N-grams as machine learning features for natural language processing
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
This paper proposes the use of local histograms (LH) over character n-grams for authorship attribution (AA). LHs are enriched histogram representations that preserve sequential information in documents; they have been successfully used for text categorization and document visualization using word histograms. In this work we explore the suitability of LHs over n-grams at the character-level for AA. We show that LHs are particularly helpful for AA, because they provide useful information for uncovering, to some extent, the writing style of authors. We report experimental results in AA data sets that confirm that LHs over character n-grams are more helpful for AA than the usual global histograms, yielding results far superior to state of the art approaches. We found that LHs are even more advantageous in challenging conditions, such as having imbalanced and small training sets. Our results motivate further research on the use of LHs for modeling the writing style of authors for related tasks, such as authorship verification and plagiarism detection.