An efficient method for determining bilingual word classes

Authors:
Franz Josef Och
Affiliations:
University of Technology, Aachen, Germany
Venue:
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Year:
1999

Citing 7
Cited 48

Threshold accepting: a general purpose optimization algorithm appearing superior to simulated annealing

Journal of Computational Physics
Class-based n-gram models of natural language

Computational Linguistics
Algorithms for bigram and trigram word clustering

Speech Communication
Finite-State Speech-to-Speech Translation

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Improving statistical natural language translation with categories and rules

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
HMM-based word alignment in statistical translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2

Efficient Integration of Maximum Entropy Lexicon Models within the Training of Statistical Alignment Models

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
The RWTH system for statistical translation of spoken dialogues

HLT '01 Proceedings of the first international conference on Human language technology research
Improving alignment quality in statistical machine translation using context-dependent maximum entropy models

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Refined lexicon models for statistical machine translation using a maximum entropy approach

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Maximum Entropy Modeling: A Suitable Framework to Learn Context-Dependent Lexicon Models for Statistical Machine Translation

Machine Learning
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Modelling lexical redundancy for machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
BiTAM: bilingual topic AdMixture models for word alignment

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Statistical machine translation

ACM Computing Surveys (CSUR)
EDA: AN EVOLUTIONARY DECODING ALGORITHM FOR STATISTICAL MACHINE TRANSLATION

Applied Artificial Intelligence
Statistical approaches to computer-assisted translation

Computational Linguistics
An Ngram-based reordering model

Computer Speech and Language
Introducing a Translation Dictionary into Phrase-Based SMT

IEICE - Transactions on Information and Systems
Word Clustering for Collocation-Based Word Sense Disambiguation

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Comparing and Integrating Alignment Template and Standard Phrase-Based Statistical Machine Translation

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Predicting chinese abbreviations from definitions: an empirical learning approach using support vector regression

Journal of Computer Science and Technology
An Approach to Estimate Perplexity Values for Language Models Based on Phrase Classes

IbPRIA '09 Proceedings of the 4th Iberian Conference on Pattern Recognition and Image Analysis
Learning bilingual linguistic reordering model for statistical machine translation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Inferring shallow-transfer machine translation rules from small parallel corpora

Journal of Artificial Intelligence Research
Ngram-based statistical machine translation enhanced with multiple weighted reordering hypotheses

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Analysis of statistical and morphological classes to generate weighted reordering hypotheses on a statistical machine translation system

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Bilingual word spectral clustering for statistical machine translation

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Improved HMM alignment models for languages with scarce resources

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Discriminative reordering models for statistical machine translation

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Segment-based classes for language modeling within the field of CSR

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Wordica: Emergence of linguistic representations for words by independent component analysis

Natural Language Engineering
SMT of Latvian, Lithuanian and Estonian Languages: a Comparative Study

Proceedings of the 2010 conference on Human Language Technologies -- The Baltic Perspective: Proceedings of the Fourth International Conference Baltic HLT 2010
Hierarchical phrase-based MT at the Charles University for the WMT 2010 shared task

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
A word-class approach to labeling PSCFG rules for machine translation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Unsupervised word alignment with arbitrary features

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A hierarchical Pitman-Yor process HMM for unsupervised part of speech induction

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A vector-space dynamic feature for phrase-based statistical machine translation

Journal of Intelligent Information Systems
Chinese abbreviation identification using abbreviation-template features and context information

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Using alignment templates to infer shallow-transfer machine translation rules

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
ILLC-UvA translation system for EMNLP-WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Hierarchical phrase-based MT at the Charles University for the WMT 2011 shared task

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Hierarchical finite-state models for speech translation using categorization of phrases

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Language supports for journal abstract writing across disciplines

Journal of Computer Assisted Learning
Cutting the long tail: hybrid language models for translation style adaptation

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Cross-lingual word clusters for direct transfer of linguistic structure

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Cardinality pruning and language model heuristics for hierarchical phrase-based translation

Machine Translation
Clustered word classes for preordering in statistical machine translation

ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
The PASCAL Challenge on Grammar Induction

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
A class-based agreement model for generating accurately inflected translations

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Capturing paradigmatic and syntagmatic lexical relations: towards accurate Chinese part-of-speech tagging

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Inducing a discriminative parser to optimize machine translation reordering

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
The Karlsruhe institute of technology translation systems for the WMT 2012

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Statistical translation after source reordering: Oracles, context-aware models, and empirical analysis

Natural Language Engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

In statistical natural language processing we always face the problem of sparse data. One way to reduce this problem is to group words into equivalence classes which is a standard method in statistical language modeling. In this paper we describe a method to determine bilingual word classes suitable for statistical machine translation. We develop an optimization criterion based on a maximum-likelihood approach and describe a clustering algorithm. We will show that the usage of the bilingual word classes we get can improve statistical machine translation.