Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Kernel k-means: spectral clustering and normalized cuts
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Segmenting documents by stylistic character
Natural Language Engineering
Journal of the American Society for Information Science and Technology
Computational methods in authorship attribution
Journal of the American Society for Information Science and Technology
A survey of modern authorship attribution methods
Journal of the American Society for Information Science and Technology
A classifier system for author recognition using synonym-based features
MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Intrinsic plagiarism detection
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
We propose a novel unsupervised method for separating out distinct authorial components of a document. In particular, we show that, given a book artificially "munged" from two thematically similar biblical books, we can separate out the two constituent books almost perfectly. This allows us to automatically recapitulate many conclusions reached by Bible scholars over centuries of research. One of the key elements of our method is exploitation of differences in synonym choice by different authors.