A lexical database of portuguese multiword expressions

Authors:
Sandra Antunes;Maria Fernanda Bacelar do Nascimento;João Miguel Casteleiro;Amália Mendes;Luísa Pereira;Tiago Sá
Affiliations:
Centro de Linguística da Universidade de Lisboa (CLUL), Lisboa, Portugal;Centro de Linguística da Universidade de Lisboa (CLUL), Lisboa, Portugal;Centro de Linguística da Universidade de Lisboa (CLUL), Lisboa, Portugal;Centro de Linguística da Universidade de Lisboa (CLUL), Lisboa, Portugal;Centro de Linguística da Universidade de Lisboa (CLUL), Lisboa, Portugal;Centro de Linguística da Universidade de Lisboa (CLUL), Lisboa, Portugal
Venue:
PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
Year:
2006

Citing 5
Cited 1

Word association norms, mutual information, and lexicography

Computational Linguistics
Collocation Mining: Exploiting Corpora for Collocation, Identification and Representation

KONVENS 2000 / Sprachkommunikation, Vorträge der gemeinsamen Veranstaltung 5. Konferenz zur Verarbeitung natürlicher Sprache (KONVENS), 6. ITG-Fachtagung "Sprachkommunikation"
Multiword Expressions: A Pain in the Neck for NLP

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Methods for the qualitative evaluation of lexical association measures

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics

Proposal for multi-word expression annotation in running text

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

This presentation focuses on an ongoing project which aims at the creation of a large lexical database of Portuguese multiword (MW) units, automatically extracted through the analysis of a balanced 50 million word corpus, statistically interpreted with lexical association measures and validated by hand. This database covers different types of MW units, like named entities, and lexical associations ranging from sets of favoured co-occurring forms to strongly lexicalized expressions. This new resource has a two-fold objective: to be an important research tool which supports the development of MW units typologies; to be of major help in developing and evaluating language processing tools able of dealing with MW expressions.