A lexical database of portuguese multiword expressions

  • Authors:
  • Sandra Antunes;Maria Fernanda Bacelar do Nascimento;João Miguel Casteleiro;Amália Mendes;Luísa Pereira;Tiago Sá

  • Affiliations:
  • Centro de Linguística da Universidade de Lisboa (CLUL), Lisboa, Portugal;Centro de Linguística da Universidade de Lisboa (CLUL), Lisboa, Portugal;Centro de Linguística da Universidade de Lisboa (CLUL), Lisboa, Portugal;Centro de Linguística da Universidade de Lisboa (CLUL), Lisboa, Portugal;Centro de Linguística da Universidade de Lisboa (CLUL), Lisboa, Portugal;Centro de Linguística da Universidade de Lisboa (CLUL), Lisboa, Portugal

  • Venue:
  • PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This presentation focuses on an ongoing project which aims at the creation of a large lexical database of Portuguese multiword (MW) units, automatically extracted through the analysis of a balanced 50 million word corpus, statistically interpreted with lexical association measures and validated by hand. This database covers different types of MW units, like named entities, and lexical associations ranging from sets of favoured co-occurring forms to strongly lexicalized expressions. This new resource has a two-fold objective: to be an important research tool which supports the development of MW units typologies; to be of major help in developing and evaluating language processing tools able of dealing with MW expressions.