Automatic detection of idiomatic clauses

Authors:
Anna Feldman;Jing Peng
Affiliations:
Department of Computer Science, Montclair State University, Montclair, NJ and Department of Linguistics, Montclair State University, Montclair, NJ;Department of Computer Science, Montclair State University, Montclair, NJ
Venue:
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Year:
2013

Citing 12
Cited 0

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Multiword Expressions: A Pain in the Neck for NLP

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Dependency-Based Construction of Semantic Space Models

Computational Linguistics
Unsupervised type and token identification of idiomatic expressions

Computational Linguistics
Unsupervised recognition of literal and non-literal use of idiomatic expressions

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Lexical encoding of MWEs

MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
Automatic identification of non-compositional multi-word expressions using latent semantic analysis

MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
A cohesion graph based approach for unsupervised recognition of literal and non-literal use of multiword expressions

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Using Gaussian Mixture models to detect figurative language in context

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe several experiments whose goal is to automatically identify idiomatic expressions in written text. We explore two approaches for the task: 1) idiom recognition as outlier detection; and 2) supervised classification of sentences. We apply principal component analysis for outlier detection. Detecting idioms as lexical outliers does not exploit class label information. So, in the following experiments, we use linear discriminant analysis to obtain a discriminant subspace and later use the three nearest neighbor classifier to obtain accuracy. We discuss pros and cons of each approach. All the approaches are more general than the previous algorithms for idiom detection --- neither do they rely on target idiom types, lexicons, or large manually annotated corpora, nor do they limit the search space by a particular type of linguistic construction.