Mining complex predicates in Hindi using a parallel Hindi-English corpus

Authors:
R. Mahesh K. Sinha
Affiliations:
Indian Institute of Technology, Kanpur, India
Venue:
MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Year:
2009

Citing 2
Cited 8

Detecting complex predicates in Hindi using POS projection across parallel corpora

MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
Relative compositionality of multi-word expressions: a study of verb-noun (v-n) collocations

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

An Information-Extraction System for Urdu---A Resource-Poor Language

ACM Transactions on Asian Language Information Processing (TALIP)
Using cross-lingual projections to generate semantic role labeled corpus for Urdu: a resource poor language

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Identification of conjunct verbs in hindi and its effect on parsing accuracy

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Extracting and classifying Urdu multiword expressions

HLT-SS '11 Proceedings of the ACL 2011 Student Session
Stepwise mining of multi-word expressions in Hindi

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Detecting noun compounds and light verb constructions: a contrastive study

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Corpus-Based acquisition of support verb constructions for portuguese

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Improving statistical machine translation through co-joining parts of verbal constructs in English-Hindi translation

SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Complex predicate is a noun, a verb, an adjective or an adverb followed by a light verb that behaves as a single unit of verb. Complex predicates (CPs) are abundantly used in Hindi and other languages of Indo Aryan family. Detecting and interpreting CPs constitute an important and somewhat a difficult task. The linguistic and statistical methods have yielded limited success in mining this data. In this paper, we present a simple method for detecting CPs of all kinds using a Hindi-English parallel corpus. A CP is hypothesized by detecting absence of the conventional meaning of the light verb in the aligned English sentence. This simple strategy exploits the fact that CP is a multiword expression with a meaning that is distinct from the meaning of the light verb. Although there are several shortcomings in the methodology, this empirical method surprisingly yields mining of CPs with an average precision of 89% and a recall of 90%.