Introduction to statistical pattern recognition (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)
Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Multiword Expressions: A Pain in the Neck for NLP
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
The Google Similarity Distance
IEEE Transactions on Knowledge and Data Engineering
Dependency-Based Construction of Semantic Space Models
Computational Linguistics
Unsupervised type and token identification of idiomatic expressions
Computational Linguistics
Unsupervised recognition of literal and non-literal use of idiomatic expressions
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
Automatic identification of non-compositional multi-word expressions using latent semantic analysis
MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Using Gaussian Mixture models to detect figurative language in context
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Hi-index | 0.00 |
We describe several experiments whose goal is to automatically identify idiomatic expressions in written text. We explore two approaches for the task: 1) idiom recognition as outlier detection; and 2) supervised classification of sentences. We apply principal component analysis for outlier detection. Detecting idioms as lexical outliers does not exploit class label information. So, in the following experiments, we use linear discriminant analysis to obtain a discriminant subspace and later use the three nearest neighbor classifier to obtain accuracy. We discuss pros and cons of each approach. All the approaches are more general than the previous algorithms for idiom detection --- neither do they rely on target idiom types, lexicons, or large manually annotated corpora, nor do they limit the search space by a particular type of linguistic construction.