Detecting novel compounds: the role of distributional evidence

Authors:
Mirella Lapata;Alex Lascarides
Affiliations:
University of Sheffield, Regent Court, Sheffield, UK;The University of Edinburgh, Edinburgh, UK
Venue:
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Year:
2003

Citing 13
Cited 10

Word association norms, mutual information, and lexicography

Computational Linguistics
C4.5: programs for machine learning

C4.5: programs for machine learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Class-based probability estimation using a semantic hierarchy

Computational Linguistics
A symbolic and surgical acquisition of terms through variation

Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing
Understanding noun compounds

Understanding noun compounds
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Does Baum-Welch re-estimation help taggers?

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Term extraction + term clustering: an integrated platform for computer-aided terminology

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Integrating symbolic and statistical representations: the lexicon pragmatics interface

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A computational analysis of complex noun phrases in Navy messages

ACL '84 Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics
CLAWS4: the tagging of the British National Corpus

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1

Highly accurate error-driven method for noun phrase detection

Pattern Recognition Letters
Unsupervised type and token identification of idiomatic expressions

Computational Linguistics
Automatic Acquisition of Qualia Structure from Corpus Data

IEICE - Transactions on Information and Systems
Interpretation of compound nominalisations using corpus and web statistics

MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
SemEval-2010 task 9: the interpretation of noun compounds using paraphrasing verbs and prepositions

DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
Comparing and combining a semantic tagger and a statistical tool for MWE extraction

Computer Speech and Language
Augmenting WordNet-based inference with argument mapping

TextInfer '09 Proceedings of the 2009 Workshop on Applied Textual Inference
SemEval-2010 task 9: The interpretation of noun compounds using paraphrasing verbs and prepositions

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Incorporating coercive constructions into a verb lexicon

RELMS '11 Proceedings of the ACL 2011 Workshop on Relational Models of Semantics
Term extraction from sparse, ungrammatical domain-specific documents

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Research on the discovery of terms from corpora has focused on word sequences whose recurrent occurrence in a corpus is indicative of their terminological status, and has not addressed the issue of discovering terms when data is sparse. This becomes apparent in the case of noun compounding, which is extremely productive: more than half of the candidate compounds extracted from a corpus are attested only once. We show how evidence about established (i.e., frequent) compounds can be used to estimate features that can discriminate rare valid compounds from rare nonce terms in addition to a variety of linguistic features than can be easily gleaned from corpora without relying on parsed text.