WikiLabel: an encyclopedic approach to labeling documents en masse

  • Authors:
  • Tadashi Nomoto

  • Affiliations:
  • National Institute of Japanese Literature, Tachikawa, Japan

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a particular approach to collective labeling of multiple documents, which works by associating the documents with Wikipedia pages and labeling them with headings the pages carry. The approach has an obvious advantage over past approaches in that it is able to produce fluent labels, as they are hand-written by human editors. We carried out some experiments on the TDT5 dataset, which found that the approach works rather robustly for an arbitrary set of documents in the news domain. Comparisons were made with some baselines, including the state of the art, with results strongly in favor of our approach.