Unsupervised Segmentation of Bibliographic Elements with Latent Permutations

  • Authors:
  • Tomonari Masada

  • Affiliations:
  • Nagasaki University, Japan

  • Venue:
  • International Journal of Organizational and Collective Intelligence
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a new approach for large-scale unsupervised segmentation of bibliographic elements. The problem is segmenting a citation given as an untagged word token sequence into subsequences so that each subsequence corresponds to a different bibliographic element e.g., authors, paper title, journal name, publication year, etc.. The same bibliographic element should be referred to by contiguous word tokens. This constraint is called contiguity constraint. The authors meet this constraint by using generalized Mallows models, effectively applied to document structure learning by Chen, Branavan, Barzilay, and Karger 2009. However, the method works for this problem only after modification. Therefore, the author proposes strategies to make the method applicable to this problem.