Search engine statistics beyond the n-gram: application to noun compound bracketing

Authors:
Preslav Nakov;Marti Hearst
Affiliations:
University of California, Berkeley, Berkeley, CA;University of California, Berkeley, Berkeley, CA
Venue:
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Year:
2005

Citing 6
Cited 25

Selection and information: a class-based approach to lexical relationships

Selection and information: a class-based approach to lexical relationships
Theory of Syntactic Recognition for Natural Languages

Theory of Syntactic Recognition for Natural Languages
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Using the web to obtain frequencies for unseen bigrams

Computational Linguistics - Special issue on web as corpus
Lexical semantic techniques for corpus analysis

Computational Linguistics - Special issue on using large corpora: II
On the semantics of noun compounds

Computer Speech and Language

Towards a base noun phrase parser using Web counts

Journal of Computing Sciences in Colleges
Out-of-context noun phrase semantic interpretation with cross-linguistic evidence

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Using the web as an implicit training set: application to structural ambiguity resolution

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Googleology is Bad Science

Computational Linguistics
Unsupervised Method for Parsing Coordinated Base Noun Phrases

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Determining the syntactic structure of medical terms in clinical notes

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
The syntax and semantics of prepositions in the task of automatic interpretation of nominal phrases and compounds: A cross-linguistic study

Computational Linguistics
Using web-search results to measure word-group similarity

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Interpretation of compound nominalisations using corpus and web statistics

MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
UCB system description for the WMT 2007 shared task

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Experiments with an annotation scheme for a knowledge-rich noun phrase interpretation system

LAW '07 Proceedings of the Linguistic Annotation Workshop
Exploring web scale language models for search query processing

Proceedings of the 19th international conference on World wide web
A knowledge-rich approach to identifying semantic relations between nominals

Information Processing and Management: an International Journal
A taxonomy, dataset, and classifier for automatic noun compound interpretation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Creating robust supervised classifiers via web-scale N-gram data

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
UvT: Memory-based pairwise ranking of paraphrasing verbs

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Predicting the semantic compositionality of prefix verbs

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Using web-scale N-grams to improve base NP parsing performance

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Web-scale features for full-scale parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Exploiting web-derived selectional preference to improve statistical dependency parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Parsing noun phrases in the penn treebank

Computational Linguistics
Using verbs to characterize noun-noun relations

AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
Splitting noun compounds via monolingual and bilingual paraphrasing: a study on Japanese katakana words

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automated functional testing of online search services

Software Testing, Verification & Reliability
Extraction of multi-word expressions from small parallel corpora

Natural Language Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to achieve the long-range goal of semantic interpretation of noun compounds, it is often necessary to first determine their syntactic structure. This paper describes an unsupervised method for noun compound bracketing which extracts statistics from Web search engines using a X2 measure, a new set of surface features, and paraphrases. On a gold standard, the system achieves results of 89.34% (baseline 66.80%), which is a sizable improvement over the state of the art (80.70%).