Interpretation of compound nominalisations using corpus and web statistics

Authors:
Jeremy Nicholson;Timothy Baldwin
Affiliations:
University of Melbourne, Australia;University of Melbourne, Australia
Venue:
MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
Year:
2006

Citing 13
Cited 2

An algorithm for suffix stripping

Readings in information retrieval
The disambiguation of nominalizations

Computational Linguistics
Using the web to obtain frequencies for unseen bigrams

Computational Linguistics - Special issue on web as corpus
Applied morphological processing of English

Natural Language Engineering
Another look at nominal compounds

ACL '84 Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics
Algorithm for automatic interpretation of noun sequences

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
A comparison of parsing technologies for the biomedical domain

Natural Language Engineering
Detecting novel compounds: the role of distributional evidence

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Transformation-based learning in the fast lane

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
A categorial variation database for English

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Web-based models for natural language processing

ACM Transactions on Speech and Language Processing (TSLP)
Noun-noun compound machine translation: a feasibility study on shallow processing

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Search engine statistics beyond the n-gram: application to noun compound bracketing

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning

Web-based and combined language models: a case study on noun compound identification

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Identification and treatment of multiword expressions applied to information retrieval

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present two novel paraphrase tests for automatically predicting the inherent semantic relation of a given compound nominalisation as one of subject, direct object, or prepositional object. We compare these to the usual verb-argument paraphrase test using corpus statistics, and frequencies obtained by scraping the Google search engine interface. We also implemented a more robust statistical measure than maximum likelihood estimation --- the confidence interval. A significant reduction in data sparseness was achieved, but this alone is insufficient to provide a substantial performance improvement.