Automatic annotation of bibliographical references for descriptive language materials

  • Authors:
  • Harald Hammarström

  • Affiliations:
  • Max Planck Institute for Evolutionary Anthropology, Department of Linguistics, Leipzig, German

  • Venue:
  • CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The present paper considers the problem of annotating bibliographical references with labels/classes, given training data of references already annotated with labels. The problem is an instance of document categorization where the documents are short and written in a wide variety of languages. The skewed distributions of title words and labels calls for special carefulness when choosing a Machine Learning approach. The present paper describes how to induce Disjunctive Normal Form formulae (DNFs), which have several advantages over Decision Trees. The approach is evaluated on a large real-world collection of bibliographical references.