Detecting complex predicates in Hindi using POS projection across parallel corpora
MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
Exploiting translational correspondences for pattern-independent MWE identification
MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Mining complex predicates in Hindi using a parallel Hindi-English corpus
MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
The design, implementation, and use of the Ngram statistics package
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Hi-index | 0.01 |
This paper describes a method for automatically extracting and classifying multiword expressions (mWEs) for Urdu on the basis of a relatively small unannotated corpus (around 8.12 million tokens). The mWEs are extracted by an unsupervised method and classified into two distinct classes, namely locations and person names. The classification is based on simple heuristics that take the co-occurrence of mWEs with distinct postpositions into account. The resulting classes are evaluated against a hand-annotated gold standard and achieve an f-score of 0.5 and 0.746 for locations and persons, respectively. A target application is the Urdu ParGram grammar, where mWEs are needed to generate a more precise syntactic and semantic analysis.