Semi-supervised learning for portuguese noun phrase extraction

  • Authors:
  • Ruy Milidiú;Cicero Santos;Julio Duarte;Raúl Rentería

  • Affiliations:
  • Departamento de Informática, Pontifícia Universidade Católica, Rio de Janeiro, Brazil;Departamento de Informática, Pontifícia Universidade Católica, Rio de Janeiro, Brazil;Centro Tecnológico do Exército, Rio de Janeiro, Brazil;Departamento de Informática, Pontifícia Universidade Católica, Rio de Janeiro, Brazil

  • Venue:
  • PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semi-supervised learning is frequently used when we have a small labeled training set but a large set of unlabeled samples. In this paper, we combine Hidden Markov Models and Transformation Based Learning in a semi-supervised learning approach. Self-training and Co-training are the two semi-supervised techniques that we apply to our scheme in order to classify Portuguese noun phrases. Our main goal here is to show that we can achieve effective noun phrase extraction using fewer tagged examples by applying a semi-supervised technique. Our models show good improvement with a small labeled corpus and little with a large one.