Unsupervised decomposition of a document into authorial components

Authors:
Moshe Koppel;Navot Akiva;Idan Dershowitz;Nachum Dershowitz
Affiliations:
Bar-Ilan University, Ramat Gan, Israel;Bar-Ilan University, Ramat Gan, Israel;Hebrew University, Jerusalem, Israel;Tel Aviv University, Ramat Aviv, Israel
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Year:
2011

Citing 8
Cited 0

Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Kernel k-means: spectral clustering and normalized cuts

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Segmenting documents by stylistic character

Natural Language Engineering
Feature instability as a criterion for selecting potential style markers: Special Topic Section on Computational Analysis of Style

Journal of the American Society for Information Science and Technology
Computational methods in authorship attribution

Journal of the American Society for Information Science and Technology
A survey of modern authorship attribution methods

Journal of the American Society for Information Science and Technology
A classifier system for author recognition using synonym-based features

MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Intrinsic plagiarism detection

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel unsupervised method for separating out distinct authorial components of a document. In particular, we show that, given a book artificially "munged" from two thematically similar biblical books, we can separate out the two constituent books almost perfectly. This allows us to automatically recapitulate many conclusions reached by Bible scholars over centuries of research. One of the key elements of our method is exploitation of differences in synonym choice by different authors.