Using cross-lingual projections to generate semantic role labeled corpus for Urdu: a resource poor language

Authors:
Smruthi Mukund;Debanjan Ghosh;Rohini K. Srihari
Affiliations:
University at Buffalo;Thomson Reuters R&D;University at Buffalo
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Year:
2010

Citing 20
Cited 2

Machine translation divergences: a formal description and proposed solution

Computational Linguistics
The double metaphone search algorithm

C/C++ Users Journal
Automatic labeling of semantic roles

Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
The Berkeley FrameNet Project

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Inducing multilingual text analysis tools via robust projection across aligned corpora

HLT '01 Proceedings of the first international conference on Human language technology research
An unsupervised method for word sense tagging using parallel corpora

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Evaluating translational correspondence using annotation projection

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Using predicate-argument structures for information extraction

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The Proposition Bank: An Annotated Corpus of Semantic Roles

Computational Linguistics
BiFrameNet: bilingual frame semantics resource construction by cross-lingual induction

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A backoff model for bootstrapping resources for non-English languages

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Alignment by agreement

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Kernel methods, syntax and semantics for relational text categorization

Proceedings of the 17th ACM conference on Information and knowledge management
Adding semantic roles to the chinese treebank

Natural Language Engineering
Detecting complex predicates in Hindi using POS projection across parallel corpora

MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
Mining complex predicates in Hindi using a parallel Hindi-English corpus

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Cross-lingual annotation projection of semantic roles

Journal of Artificial Intelligence Research
An Information-Extraction System for Urdu---A Resource-Poor Language

ACM Transactions on Asian Language Information Processing (TALIP)

Sentiment analysis of urdu language: handling phrase-level negation

MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu text

Artificial Intelligence Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we explore the possibility of using cross lingual projections that help to automatically induce role-semantic annotations in the PropBank paradigm for Urdu, a resource poor language. This technique provides annotation projections based on word alignments. It is relatively inexpensive and has the potential to reduce human effort involved in creating semantic role resources. The projection model exploits lexical as well as syntactic information on an English-Urdu parallel corpus. We show that our method generates reasonably good annotations with an accuracy of 92% on short structured sentences. Using the automatically generated annotated corpus, we conduct preliminary experiments to create a semantic role labeler for Urdu. The results of the labeler though modest, are promising and indicate the potential of our technique to generate large scale annotations for Urdu.