The derivation of a large computational lexicon for English from LDOCE
Computational lexicography for natural language processing
Numerical recipes in C (2nd ed.): the art of scientific computing
Numerical recipes in C (2nd ed.): the art of scientific computing
Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
Multiword Expressions: A Pain in the Neck for NLP
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Using the web to obtain frequencies for unseen bigrams
Computational Linguistics - Special issue on web as corpus
On building a more efficient grammar by exploiting types
Natural Language Engineering
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Extracting the unextractable: a case study on verb-particles
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Picking them up and figuring them out: verb-particle constructions, noise and idiomaticity
CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Automated multiword expression prediction for grammar engineering
MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures
MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Semantics-based multiword expression extraction
MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Deep lexical acquisition of verb-particle constructions
Computer Speech and Language
The availability of verb-particle constructions in lexical resources: How much is enough?
Computer Speech and Language
Using small random samples for the manual evaluation of statistical association measures
Computer Speech and Language
The design, implementation, and use of the Ngram statistics package
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Open-Source portuguese–spanish machine translation
PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
Extraction of multi-word expressions from small parallel corpora
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Detecting noun compounds and light verb constructions: a contrastive study
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
A hybrid approach for multiword expression identification
PROPOR'10 Proceedings of the 9th international conference on Computational Processing of the Portuguese Language
A cascaded classification approach to semantic head recognition
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Extraction of multi-word expressions from small parallel corpora
Natural Language Engineering
Hi-index | 0.00 |
Multiword Expressions (MWEs) are one of the stumbling blocks for more precise Natural Language Processing (NLP) systems. Particularly, the lack of coverage of MWEs in resources can impact negatively on the performance of tasks and applications, and can lead to loss of information or communication errors. This is especially problematic in technical domains, where a significant portion of the vocabulary is composed of MWEs. This paper investigates the use of a statistically-driven alignment-based approach to the identification of MWEs in technical corpora. We look at the use of several sources of data, including parallel corpora, using English and Portuguese data from a corpus of Pediatrics, and examining how a second language can provide relevant cues for this tasks. We report results obtained by a combination of statistical measures and linguistic information, and compare these to the reported in the literature. Such an approach to the (semi-)automatic identification of MWEs can considerably speed up lexicographic work, providing a more targeted list of MWE candidates.