Multiword Expressions: A Pain in the Neck for NLP
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
A statistical information extraction system for Turkish
Natural Language Engineering
Extracting the unextractable: a case study on verb-particles
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Extracting multiword expressions with a semantic tagger
MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Machine translation between Turkic languages
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Editorial: Introduction to the special issue on multiword expressions: Having a crack at a hard nut
Computer Speech and Language
Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Multi-word expression recognition integrated with two-level finite state transducer
HCI'07 Proceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments
Identifying multi-word expressions by leveraging morphological and syntactic idiosyncrasy
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Collocation extraction in Turkish texts using statistical methods
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Accommodating multiword expressions in an arabic LFG grammar
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Multiword expressions in statistical dependency parsing
SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
Hi-index | 0.00 |
This paper describes a multi-word expression processor for preprocessing Turkish text for various language engineering applications. In addition to the fairly standard set of lexicalized collocations and multi-word expressions such as named-entities, Turkish uses a quite wide range of semi-lexicalized and non-lexicalized collocations. After an overview of relevant aspects of Turkish, we present a description of the multi-word expressions we handle. We then summarize the computational setting in which we employ a series of components for tokenization, morphological analysis, and multi-word expression extraction. We finally present results from runs over a large corpus and a small gold-standard corpus.