A bootstrapping approach for training a NER with conditional random fields

  • Authors:
  • Jorge Teixeira;Luís Sarmento;Eugénio Oliveira

  • Affiliations:
  • LIACC - FEUP/DEI & Labs Sapo UP, Porto, Portugal;LIACC - FEUP/DEI & Labs Sapo UP, Porto, Portugal;LIACC - FEUP/DEI & Labs Sapo UP, Porto, Portugal

  • Venue:
  • EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a bootstrapping approach for training a Named Entity Recognition (NER) system. Our method starts by annotating persons' names on a dataset of 50,000 news items. This is performed using a simple dictionary-based approach. Using such training set we build a classification model based on Conditional Random Fields (CRF). We then use the inferred classification model to perform additional annotations of the initial seed corpus, which is then used for training a new classification model. This cycle is repeated until the NER model stabilizes. We evaluate each of the bootstrapping iterations by calculating: (i) the precision and recall of the NER model in annotating a small gold-standard collection (HAREM); (ii) the precision and recall of the CRF bootstrapping annotation method over a small sample of news; and (iii) the correctness and the number of new names identified. Additionally, we compare the NER model with a dictionary-based approach, our baseline method. Results show that our bootstrapping approach stabilizes after 7 iterations, achieving high values of precision (83%) and recall (68%).