An n-gram frequency database reference to handle MWE extraction in NLP applications

Authors:
Patrick Watrin;Thomas François
Affiliations:
Centre for Natural Language Processing, Institut Langage et Communication, UCLouvain;Aspirant F. N. R. S., Centre for Natural Language Processing, Institut Langage et Communication, UCLouvain
Venue:
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Year:
2011

Citing 12
Cited 3

Word association norms, mutual information, and lexicography

Computational Linguistics
Foundations of statistical natural language processing

Foundations of statistical natural language processing
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric

Journal of the ACM (JACM)
Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units

EPIA '99 Proceedings of the 9th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Multiword Expressions: A Pain in the Neck for NLP

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Methods for the qualitative evaluation of lexical association measures

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Combining association measures for collocation extraction

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A linguistic knowledge discovery tool: very large ngram database search with arbitrary wildcards

COLING '08 22nd International Conference on on Computational Linguistics: Demonstration Papers
Improving statistical machine translation using domain bilingual multiword expressions

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
The role of multi-word units in interactive information retrieval

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Discriminative strategies to integrate multiword expression recognition and parsing

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Parsing models for identifying multiword expressions

Computational Linguistics
Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields

ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

The identification and extraction of Multiword Expressions (MWEs) currently deliver satisfactory results. However, the integration of these results into a wider application remains an issue. This is mainly due to the fact that the association measures (AMs) used to detect MWEs require a critical amount of data and that the MWE dictionaries cannot account for all the lexical and syntactic variations inherent in MWEs. In this study, we use an alternative technique to overcome these limitations. It consists in defining an n-gram frequency data-base that can be used to compute AMs on-the-fly, allowing the extraction procedure to efficiently process all the MWEs in a text, even if they have not been previously observed.