Unsupervised type and token identification of idiomatic expressions

Authors:
Afsaneh Fazly;Paul Cook;Suzanne Stevenson
Affiliations:
-;-;-
Venue:
Computational Linguistics
Year:
2009

Citing 25
Cited 16

Elements of information theory

Elements of information theory
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Multiword Expressions: A Pain in the Neck for NLP

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Similarity-based estimation of word cooccurrence probabilities

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Automatic identification of non-compositional phrases

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Building a lexical knowledge-base of near-synonym differences

Building a lexical knowledge-base of near-synonym differences
Detecting novel compounds: the role of distributional evidence

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Methods for the qualitative evaluation of lexical association measures

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A statistical approach to the semantics of verb-particles

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Detecting a continuum of compositionality in phrasal verbs

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
An empirical model of multiword expression decomposability

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Paradigmatic modifiability statistics for the extraction of complex multi-word terms

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Japanese idiom recognition: drawing a line between literal and idiomatic meanings

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Lexical encoding of MWEs

MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
Automatic identification of non-compositional multi-word expressions using latent semantic analysis

MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
A measure of syntactic flexibility for automatically identifying multiword expressions in corpora

MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures

MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Semantics-based multiword expression extraction

MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Pulling their weight: exploiting syntactic forms for the automatic identification of idiomatic expressions in context

MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Automatic extraction of idioms using graph analysis and asymmetric lexicosyntactic patterns

DeepLA '05 Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition
Disambiguating Japanese compound verbs

Computer Speech and Language

Prepositions in applications: A survey and introduction to the special issue

Computational Linguistics
Unsupervised recognition of literal and non-literal use of idiomatic expressions

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Classifier combination for contextual idiom detection without labelled data

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
A cohesion graph based approach for unsupervised recognition of literal and non-literal use of multiword expressions

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Using Gaussian Mixture models to detect figurative language in context

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Predicting the semantic compositionality of prefix verbs

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Identifying idiomatic expressions using phrase alignments in bilingual parallel corpus

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Linguistic cues for distinguishing literal and non-literal usages

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Automatic extraction of NV expressions in Basque: basic issues on cooccurrence techniques

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Learning English light verb constructions: contextual or statistical

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
A hybrid approach for multiword expression identification

PROPOR'10 Proceedings of the 9th international conference on Computational Processing of the Portuguese Language
Unsupervised identification of persian compound verbs

MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Combining resources for MWE-token classification

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
An unsupervised ranking model for noun-noun compositionality

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Automatic detection of idiomatic clauses

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Design and analysis of genetic algorithm based Chinese keyword extracting

International Journal of Computer Applications in Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Idiomatic expressions are plentiful in everyday language, yet they remain mysterious, as it is not clear exactly how people learn and understand them. They are of special interest to linguists, psycholinguists, and lexicographers, mainly because of their syntactic and semantic idiosyncrasies as well as their unclear lexical status. Despite a great deal of research on the properties of idioms in the linguistics literature, there is not much agreement on which properties are characteristic of these expressions. Because of their peculiarities, idiomatic expressions have mostly been overlooked by researchers in computational linguistics. In this article, we look into the usefulness of some of the identified linguistic properties of idioms for their automatic recognition. Specifically, we develop statistical measures that each model a specific property of idiomatic expressions by looking at their actual usage patterns in text. We use these statistical measures in a type-based classification task where we automatically separate idiomatic expressions (expressions with a possible idiomatic interpretation) from similar-on-the-surface literal phrases (for which no idiomatic interpretation is possible). In addition, we use some of the measures in a token identification task where we distinguish idiomatic and literal usages of potentially idiomatic expressions in context.