Estimating satisfactoriness of selectional restriction from corpus without a thesaurus

Authors:
Yoichi Tomiura;Shosaku Tanaka;Toru Hitaka
Affiliations:
Kyushu University, Fukuoka, Japan;Ritsumeikan University, Kyoto, Japan;Kyushu University (retired March 2003)
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2005

Citing 7
Cited 0

Similarity-based approaches to natural language processing

Similarity-based approaches to natural language processing
Similarity-Based Models of Word Cooccurrence Probabilities

Machine Learning - Special issue on natural language learning
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Japanese dependency structure analysis based on maximum entropy models

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Generalizing automatically generated selectional patterns

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

A selectional restriction specifies what combinations of words are semantically valid in a particular syntactic construction. This is one of the basic and important pieces of knowledge in natural language processing and has been used for syntactic and word sense disambiguation. In the case of acquiring the selectional restriction for many combinations of words from a corpus, it is necessary to estimate whether or not a word combination that is not observed in the corpus satisfies the selectional restriction. This paper proposes a new method for estimating the degree of satisfaction of the selectional restriction for a word combination from a tagged corpus, based on the multiple regression model. The independent variables of this model correspond to modifiers. Unlike a conventional multiple regression analysis, the independent variables are also parameters to be learned. We experiment on estimating the degree of satisfaction of the selectional restriction for Japanese word combinations 〈noun, postpositional-particle, verb〉. The experimental results indicate that our method estimates the degree of satisfaction of a word combination not very well observed in the corpus, and that the accuracy of syntactic disambiguation using the co-occurrencies estimated by our method is higher than using co-occurrence probabilities smoothed by previous methods.