Word association norms, mutual information, and lexicography
Computational Linguistics
Multiword Expressions: A Pain in the Neck for NLP
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
A systematic comparison of various statistical alignment models
Computational Linguistics
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Web-based models for natural language processing
ACM Transactions on Speech and Language Processing (TSLP)
Extraction of translation unit from Chinese-English parallel corpora
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Reducing parameter space for word alignment
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
A statistical approach to the semantics of verb-particles
MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
An empirical model of multiword expression decomposability
MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Translation by machine of complex nominals: getting it right
MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
Multiword expression filtering for building knowledge maps
MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
Non-contiguous word sequences for information retrieval
MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
Automatic identification of non-compositional multi-word expressions using latent semantic analysis
MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
Using information about multi-word expressions for the word-alignment task
MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
Semantics-based multiword expression extraction
MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Choosing an optimal architecture for segmentation and POS-tagging of modern Hebrew
Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Comparing and combining a semantic tagger and a statistical tool for MWE extraction
Computer Speech and Language
Disambiguating Japanese compound verbs
Computer Speech and Language
Statistically-driven alignment-based multiword expression identification for technical domains
MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Exploiting translational correspondences for pattern-independent MWE identification
MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Improving statistical machine translation using domain bilingual multiword expressions
MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Search engine statistics beyond the n-gram: application to noun compound bracketing
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Natural Language Processing with Python
Natural Language Processing with Python
Task-based evaluation of multiword expressions: a pilot study in statistical machine translation
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Identifying multi-word expressions by leveraging morphological and syntactic idiosyncrasy
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Extraction of multi-word expressions from small parallel corpora
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Language models for machine translation: original vs. translated texts
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Identification of multi-word expressions by combining multiple linguistic information sources
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
We present a general, novel methodology for extracting multi-word expressions (MWEs) of various types, along with their translations, from small, word-aligned parallel corpora. Unlike existing approaches, we focus on misalignments; these typically indicate expressions in the source language that are translated to the target in a non-compositional way. We introduce a simple algorithm that proposes MWE candidates based on such misalignments, relying on 1:1 alignments as anchors that delimit the search space. We use a large monolingual corpus to rank and filter these candidates. Evaluation of the quality of the extraction algorithm reveals significant improvements over naïve alignment-based methods. The extracted MWEs, with their translations, are used in the training of a statistical machine translation system, showing a small but significant improvement in its performance.