Detecting complex predicates in Hindi using POS projection across parallel corpora

Authors:
Amitabha Mukerjee;Ankit Soni;Achla M. Raina
Affiliations:
Indian Institute of Technology Kanpur, Kanpur, India;Indian Institute of Technology Kanpur, Kanpur, India;Indian Institute of Technology Kanpur, Kanpur, India
Venue:
MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
Year:
2006

Citing 7
Cited 6

Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Multiword Expressions: A Pain in the Neck for NLP

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Automatic identification of non-compositional phrases

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Lexical encoding of MWEs

MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing

Mining complex predicates in Hindi using a parallel Hindi-English corpus

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Using cross-lingual projections to generate semantic role labeled corpus for Urdu: a resource poor language

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Identification of conjunct verbs in hindi and its effect on parsing accuracy

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Extracting and classifying Urdu multiword expressions

HLT-SS '11 Proceedings of the ACL 2011 Student Session
Stepwise mining of multi-word expressions in Hindi

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Corpus-Based acquisition of support verb constructions for portuguese

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

Complex Predicates or CPs are multiword complexes functioning as single verbal units. CPs are particularly pervasive in Hindi and other Indo-Aryan languages, but an usage account driven by corpus-based identification of these constructs has not been possible since single-language systems based on rules and statistical approaches require reliable tools (POS taggers, parsers, etc.) that are unavailable for Hindi. This paper highlights the development of first such database based on the simple idea of projecting POS tags across an English-Hindi parallel corpus. The CP types considered include adjective-verb (AV), noun-verb (NV), adverb-verb (Adv-V), and verb-verb (VV) composites. CPs are hypothesized where a verb in English is projected onto a multi-word sequence in Hindi. While this process misses some CPs, those that are detected appear to be more reliable (83% precision, 46% recall). The resulting database lists usage instances of 1439 CPs in 4400 sentences.