Word association norms, mutual information, and lexicography
Computational Linguistics
Collocation Mining: Exploiting Corpora for Collocation, Identification and Representation
KONVENS 2000 / Sprachkommunikation, Vorträge der gemeinsamen Veranstaltung 5. Konferenz zur Verarbeitung natürlicher Sprache (KONVENS), 6. ITG-Fachtagung "Sprachkommunikation"
Multiword Expressions: A Pain in the Neck for NLP
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Methods for the qualitative evaluation of lexical association measures
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Proposal for multi-word expression annotation in running text
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Hi-index | 0.00 |
This presentation focuses on an ongoing project which aims at the creation of a large lexical database of Portuguese multiword (MW) units, automatically extracted through the analysis of a balanced 50 million word corpus, statistically interpreted with lexical association measures and validated by hand. This database covers different types of MW units, like named entities, and lexical associations ranging from sets of favoured co-occurring forms to strongly lexicalized expressions. This new resource has a two-fold objective: to be an important research tool which supports the development of MW units typologies; to be of major help in developing and evaluating language processing tools able of dealing with MW expressions.