Unsupervised learning of p NP p word combinations

Authors:
Sofía N. Galicia-Haro;Alexander Gelbukh
Affiliations:
Faculty of Sciences, UNAM Universitary City, Mexico City, Mexico;Center for Computing Research, National Polytechnic Institute, Mexico
Venue:
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2005

Citing 4
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Using Electronic Texts for an Annotated Corpus Building

ENC '03 Proceedings of the 4th Mexican International Conference on Computer Science
Methods for the qualitative evaluation of lexical association measures

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
The design, implementation, and use of the Ngram statistics package

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We evaluate the possibility to learn, in an unsupervised manner, a list of idiomatic word combinations of the type preposition + noun phrase + preposition (P NP P), namely, such groups with three or more simple forms that behave as a whole lexical unit and have semantic and syntactic properties not deducible from the corresponding properties of each simple form, e.g., by means of, in order to, in front of. We show that idiomatic P NP P combinations have some statistical properties distinct from those of usual idiomatic collocations. In particular, we found that most frequent P NP P trigrams tend to be idiomatic. Of other statistical measures, log-likelihood performs almost as good as frequency for detecting idiomatic expressions of this type, while chi-square and point-wise mutual information perform very poor. We experiment on Spanish material.