Unsupervised decomposition of a document into authorial components

  • Authors:
  • Moshe Koppel;Navot Akiva;Idan Dershowitz;Nachum Dershowitz

  • Affiliations:
  • Bar-Ilan University, Ramat Gan, Israel;Bar-Ilan University, Ramat Gan, Israel;Hebrew University, Jerusalem, Israel;Tel Aviv University, Ramat Aviv, Israel

  • Venue:
  • HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a novel unsupervised method for separating out distinct authorial components of a document. In particular, we show that, given a book artificially "munged" from two thematically similar biblical books, we can separate out the two constituent books almost perfectly. This allows us to automatically recapitulate many conclusions reached by Bible scholars over centuries of research. One of the key elements of our method is exploitation of differences in synonym choice by different authors.