Extracting the unextractable: a case study on verb-particles

Authors:
Timothy Baldwin;Aline Villavicencio
Affiliations:
Stanford University, Stanford, CA;University of Cambridge, Cambridge, UK
Venue:
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Year:
2002

Citing 12
Cited 25

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Multiword Expressions: A Pain in the Neck for NLP

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Applied morphological processing of English

Natural Language Engineering
On building a more efficient grammar by exploiting types

Natural Language Engineering
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Single-classifier memory-based phrase chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Robust, applied morphological generation

INLG '00 Proceedings of the first international conference on Natural language generation - Volume 14

Verb-particle constructions and lexical resources

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
A statistical approach to the semantics of verb-particles

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Detecting a continuum of compositionality in phrasal verbs

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
An empirical model of multiword expression decomposability

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Detecting multiword verbs in the English sublanguage of MEDLINE abstracts

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Searching for Illustrative Sentences for Multiword Expressions in a Research Paper Database

ICADL 08 Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information
Prepositions in applications: A survey and introduction to the special issue

Computational Linguistics
Identification, quantitative description, and preliminary distributional analysis of German particle verbs

ElectricDict '04 Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries
Integrating morphology with multi-word expression processing in Turkish

MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
Automatic identification of English verb particle constructions using linguistic features

Prepositions '06 Proceedings of the Third ACL-SIGSEM Workshop on Prepositions
Deep lexical acquisition of verb-particle constructions

Computer Speech and Language
The availability of verb-particle constructions in lexical resources: How much is enough?

Computer Speech and Language
Learning about the meaning of verb-particle constructions from corpora

Computer Speech and Language
Disambiguating Japanese compound verbs

Computer Speech and Language
Statistically-driven alignment-based multiword expression identification for technical domains

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Exploiting translational correspondences for pattern-independent MWE identification

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
A re-examination of lexical association measures

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
A cohesion graph based approach for unsupervised recognition of literal and non-literal use of multiword expressions

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Identification and treatment of multiword expressions applied to information retrieval

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Identifying verbal collocations in wikipedia articles

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
A rapid method to extract multiword expressions with statistic measures and linguistic rules

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Automatic extraction of fixed multiword expressions

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Unsupervised identification of persian compound verbs

MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Automatic identification of persian light verb constructions

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Learning verb inference rules from linguistically-motivated evidence

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper proposes a series of techniques for extracting English verb--particle constructions from raw text corpora. We initially propose three basic methods, based on tagger output, chunker output and a chunk grammar, respectively, with the chunk grammar method optionally combining with an attachment resolution module to determine the syntactic structure of verb--preposition pairs in ambiguous constructs. We then combine the three methods together into a single classifier, and add in a number of extra lexical and frequentistic features, producing a final F-score of 0.865 over the WSJ.