Semi-supervised Learning Framework for Cross-Lingual Projection

Authors:
PengLong Hu;Mo Yu;Jing Li;CongHui Zhu;TieJun Zhao
Affiliations:
-;-;-;-;-
Venue:
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Year:
2011

Citing 8
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Bootstrapping parsers via syntactic projection across parallel texts

Natural Language Engineering
Optimal constituent alignment with edge covers for semantic projection

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A backoff model for bootstrapping resources for non-English languages

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Automatic adaptation of annotation standards: Chinese word segmentation and POS tagging: a case study

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Dependency parsing and projection based on word-pair classification

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cross-lingual projection encounters two major challenges, the noise from word-alignment error and the syntactic divergences between two languages. To solve these two problems, a semi-supervised learning framework of cross-lingual projection is proposed to get better annotations using parallel data. Moreover, a projection model is introduced to model the projection process of labeling from the resource-rich language to the resource-scarce language. The projection model, together with the traditional target model of cross-lingual projection, can be seen as two views of parallel data. Utilizing these two views, an extension of co-training algorithm to structured predictions is designed to boost the result of the two models. Experiments show that the proposed cross-lingual projection method improves the accuracy in the task of POS-tagging projection. And using only one-to-one alignments proves to lead to more accurate results than using all kinds of alignment information.