Multiword Expressions: A Pain in the Neck for NLP
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora
Computational Linguistics
Termight: identifying and translating technical terminology
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Detecting multiword verbs in the English sublanguage of MEDLINE abstracts
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Blog categorization exploiting domain dictionary and dynamically estimated domains of unknown words
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
What is at stake: a case study of Russian expressions starting with a preposition
MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
Integrating morphology with multi-word expression processing in Turkish
MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Detecting noun compounds and light verb constructions: a contrastive study
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
e-Learning materials development based on abstract analysis using web tools
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
Learning to detect english and hungarian light verb constructions
ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 1
Hi-index | 0.00 |
Automatic extraction of multiword expressions (MWE) presents a tough challenge for the NLP community and corpus linguistics. Although various statistically driven or knowledge-based approaches have been proposed and tested, efficient MWE extraction still remains an unsolved issue. In this paper, we present our research work in which we tested approaching the MWE issue using a semantic field annotator. We use an English semantic tagger (USAS) developed at Lancaster University to identify multiword units which depict single semantic concepts. The Meter Corpus (Gaizauskas et al., 2001; Clough et al., 2002) built in Sheffield was used to evaluate our approach. In our evaluation, this approach extracted a total of 4,195 MWE candidates, of which, after manual checking, 3,792 were accepted as valid MWEs, producing a precision of 90.39% and an estimated recall of 39.38%. Of the accepted MWEs, 68.22% or 2,587 are low frequency terms, occurring only once or twice in the corpus. These results show that our approach provides a practical solution to MWE extraction.