Weakly supervised named entity transliteration and discovery from multilingual comparable corpora

Authors:
Alexandre Klementiev;Dan Roth
Affiliations:
University of Illinois, Urbana, IL;University of Illinois, Urbana, IL
Venue:
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Year:
2006

Citing 11
Cited 29

Learning Boolean Functions in an Infinite Attribute Space

Machine Learning
Learning to resolve natural language ambiguities: a unified approach

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Learning in Natural Language

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Statistical transliteration for english-arabic cross language information retrieval

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Machine transliteration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
An English to Korean transliteration model of extended Markov window

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Weakly supervised named entity transliteration and discovery from multilingual comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Named entity discovery using comparable news articles

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A discriminative framework for bilingual word alignment

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Identification and tracing of ambiguous names: discriminative and generative approaches

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence

Weakly supervised named entity transliteration and discovery from multilingual comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Weakly-supervised discovery of named entities using web search queries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
An Unsupervised Learning Algorithm for Rank Aggregation

ECML '07 Proceedings of the 18th European conference on Machine Learning
Mining named entity transliteration equivalents from comparable corpora

Proceedings of the 17th ACM conference on Information and knowledge management
Using English information in non-English web search

Proceedings of the 2nd ACM workshop on Improving non english web searching
On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages

Information Retrieval
Low-Cost Supervision for Multiple-Source Attribute Extraction

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Active sample selection for named entity transliteration

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
MINT: a method for effective and scalable mining of named entity transliterations from large comparable corpora

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Feature-based method for document alignment in comparable news corpora

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Transliteration as constrained optimization

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised constraint driven learning for transliteration discovery

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Building a semantic lexicon of English nouns via bootstrapping

SRWS '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium
Learning better transliterations

Proceedings of the 18th ACM conference on Information and knowledge management
Report of NEWS 2009 machine transliteration shared task

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Transliteration for Resource-Scarce Languages

ACM Transactions on Asian Language Information Processing (TALIP)
Report of NEWS 2010 transliteration generation shared task

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Report of NEWS 2010 transliteration mining shared task

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Enhancing mention detection using projection via aligned corpora

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Machine transliteration survey

ACM Computing Surveys (CSUR)
EM-based hybrid model for bilingual terminology extraction from comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Machine transliteration: leveraging on third languages

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
An algorithm for unsupervised transliteration mining with an application to word alignment

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
From bilingual dictionaries to interlingual document representations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Improving bilingual projections via sparse covariance matrices

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Toward statistical machine translation without parallel corpora

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Regularized interlingual projections: evaluation on multilingual transliteration

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Name phylogeny: a generative model of string variation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Report of NEWS 2012 machine transliteration shared task

NEWS '12 Proceedings of the 4th Named Entity Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Named Entity recognition (NER) is an important part of many natural language processing tasks. Current approaches often employ machine learning techniques and require supervised data. However, many languages lack such resources. This paper presents an (almost) unsupervised learning algorithm for automatic discovery of Named Entities (NEs) in a resource free language, given a bilingual corpora in which it is weakly temporally aligned with a resource rich language. NEs have similar time distributions across such corpora, and often some of the tokens in a multi-word NE are transliterated. We develop an algorithm that exploits both observations iteratively. The algorithm makes use of a new, frequency based, metric for time distributions and a resource free discriminative approach to transliteration. Seeded with a small number of transliteration pairs, our algorithm discovers multi-word NEs, and takes advantage of a dictionary (if one exists) to account for translated or partially translated NEs. We evaluate the algorithm on an English-Russian corpus, and show high level of NEs discovery in Russian.