A statistical approach to machine translation
Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
Aligning sentences in parallel corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
A statistical approach to language translation
COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,
Translating collocations for bilingual lexicons: a statistical approach
Computational Linguistics
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Alignment and Matching of Bilingual English–Chinese News Texts
Machine Translation
Review Article: Example-based Machine Translation
Machine Translation
Using Corpus-Based Approaches in a System for Multilingual Information Retrieval
Information Retrieval
World Wide Web - A Multilingual Language Resource
WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development
Multilingual Information Retrieval Based on Document Alignment Techniques
ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
The Challenge of Parallel Text Processing
TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
A Multilingual Procedure for Dictionary-Based Sentence Alignment
AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Fast and Accurate Sentence Alignment of Bilingual Corpora
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Multilingual Information Retrieval Based on Parallel Texts from the Web
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Cross-language information retrieval: experiments based on CLEF 2000 corpora
Information Processing and Management: an International Journal
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Adaptive multilingual sentence boundary disambiguation
Computational Linguistics
A class-based approach to word alignment
Computational Linguistics
The automatic translation of discourse structures
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Adaptive sentence boundary disambiguation
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
High-performance bilingual text alignment using statistical and dictionary information
Natural Language Engineering
An experiment in hybrid dictionary and statistical sentence alignment
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
An IR approach for translating new words from nonparallel, comparable texts
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Flow network models for word alignment and terminology extraction from bilingual corpora
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
High-performance bilingual text alignment using statistical and dictionary information
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Structural feature selection for English-Korean statistical machine translation
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Bilingual text, matching using bilingual dictionary and statistics
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Building an MT dictionary from parallel texts based on linguistic and statistical information
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
An IBM-PC environment for Chinese corpus analysis
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Derivation of underlying valency frames from a learner's dictionary
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Aligning more words with high precision for small bilingual corpora
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Translating unknown cross-lingual queries in digital libraries using a web-based approach
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Mixed language query disambiguation
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Automatic identification of word translations from unrelated English and German corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Creating a multilingual collocation dictionary from large text corpora
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Linguistic variation and computation
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Translation Disambiguation in Mixed Language Queries
Machine Translation
A cheap and fast way to build useful translation lexicons
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A robust cross-style bilingual sentences alignment model
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Reliable measures for aligning Japanese-English news articles and sentences
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Evaluation challenges in large-scale document summarization
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Extracting significant words from corpora for ontology extraction
Proceedings of the 3rd international conference on Knowledge capture
Comparative study of monolingual and multilingual search models for use with asian languages
ACM Transactions on Asian Language Information Processing (TALIP)
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Aligning and using an English-Inuktitut parallel corpus
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Construction and analysis of Japanese-English broadcast news corpus with named entity tags
MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Exploiting the Web as the multilingual corpus for unknown query translation
Journal of the American Society for Information Science and Technology
Automatic extraction of bilingual word pairs using inductive chain learning in various languages
Information Processing and Management: an International Journal
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Multi-level bootstrapping for extracting parallel sentences from a quasi-comparable corpus
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Robust sub-sentential alignment of phrase-structure trees
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Sentence alignment using P-NNT and GMM
Computer Speech and Language
ATLAS: a new text alignment architecture
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Statistical machine translation
ACM Computing Surveys (CSUR)
Critical Edition of Sanskrit Texts
Sanskrit Computational Linguistics
Constructing Parallel Corpus from Movie Subtitles
ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Translating medical terminologies through word alignment in parallel text corpora
Journal of Biomedical Informatics
On the use of comparable corpora to improve SMT performance
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Improved sentence alignment on parallel web pages using a stochastic tree alignment model
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Pseudo-aligned multilingual corpora
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Using normalized alignment scores to detect incorrectly aligned segments
Proceedings of the 2nd international workshop on Patent information retrieval
Nukti: English-Inuktitut word alignment system description
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Partitioning parallel documents using binary segmentation
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Exploiting comparable corpora with TER and TERp
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Bilingual concordancers and translation memories: a comparative evaluation
LRTWRT '04 Proceedings of the Second International Workshop on Language Resources for Translation Work, Research and Training
Aligning portuguese and chinese parallel texts using confidence bands
PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Selecting target word using contexonym comparison method
Proceedings of the 2007 conference on Human interface: Part I
Local context selection for aligning sentences in parallel corpora
CONTEXT'07 Proceedings of the 6th international and interdisciplinary conference on Modeling and using context
Context-based sentence alignment in parallel corpora
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
BabelNet: building a very large multilingual semantic network
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
LetsMT! --Online Platform for Sharing Training Data and Building User Tailored Machine Translation
Proceedings of the 2010 conference on Human Language Technologies -- The Baltic Perspective: Proceedings of the Fourth International Conference Baltic HLT 2010
Consistency checking for Treebank alignment
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Text-based English-Arabic sentence alignment
ICIC'06 Proceedings of the 2006 international conference on Intelligent computing: Part II
Evaluation of axiomatic approaches to crosslanguage retrieval
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Using parallel corpora for multilingual (multi-document) summarisation evaluation
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
A survey of paraphrasing and textual entailment methods
Journal of Artificial Intelligence Research
Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Using SRX standard for sentence segmentation
LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Improvement of machine translation evaluation by simple linguistically motivated features
Journal of Computer Science and Technology - Special issue on natural language processing
ParaSense or how to use parallel corpora for word sense disambiguation
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Building a web-based parallel corpus and filtering out machine-translated text
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
An evaluation and possible improvement path for current SMT behavior on ambiguous nouns
SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
Graph-based bilingual sentence alignment from large scale web pages
NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Applied Intelligence
Parallel sentence generation from comparable corpora for improved SMT
Machine Translation
Evaluation of alignment methods for HTML parallel text
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
An unsupervised alignment algorithm for text simplification corpus construction
MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Probabilistic neural network based english-arabic sentence alignment
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Automatic filtering of bilingual corpora for statistical machine translation
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Combining sentence length with location information to align monolingual parallel texts
AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Weighted finite-state transducer inference for limited-domain speech-to-speech translation
PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
Enabling users to create their own web-based machine translation engine
Proceedings of the 21st international conference companion on World Wide Web
Extracting parallel paragraphs and sentences from english-persian translated documents
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Analyzing parallelism and domain similarities in the MAREC patent corpus
IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Generalized biwords for bitext compression and translation spotting
Journal of Artificial Intelligence Research
Design of a hybrid high quality machine translation system
EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
LetsMT!: a cloud-based platform for do-it-yourself machine translation
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Machine translation for multilingual summary content evaluation
Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization
Application of clause alignment for statistical machine translation
SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Effective and efficient?: bilingual sentiment lexicon extraction using collocation alignment
Proceedings of the 21st ACM international conference on Information and knowledge management
A Fast and Accurate Method for Bilingual Opinion Lexicon Extraction
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
How many multiword expressions do people know?
ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 1
Manifold alignment preserving global geometry
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Identifying useful human correction feedback from an on-line machine translation service
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Generating storylines from sensor data
Pervasive and Mobile Computing
Hi-index | 0.00 |
Researchers in both machine translation (e.g., Brown et al. 1990) and bilingual lexicography (e.g., Klavans and Tzoukermann 1990) have recently become interested in studying bilingual corpora, bodies of text such as the Canadian Hansards (parliamentary proceedings), which are available in multiple languages (such as French and English). One useful step is to align the sentences, that is, to identify correspondences between sentences in one language and sentences in the other language.This paper will describe a method and a program (align) for aligning sentences based on a simple statistical model of character lengths. The program uses the fact that longer sentences in one language tend to be translated into longer sentences in the other language, and that shorter sentences tend to be translated into shorter sentences. A probabilistic score is assigned to each proposed correspondence of sentences, based on the scaled difference of lengths of the two sentences (in characters) and the variance of this difference. This probabilistic score is used in a dynamic programming framework to find the maximum likelihood alignment of sentences.It is remarkable that such a simple approach works as well as it does. An evaluation was performed based on a trilingual corpus of economic reports issued by the Union Bank of Switzerland (UBS) in English, French, and German. The method correctly aligned all but 4% of the sentences. Moreover, it is possible to extract a large subcorpus that has a much smaller error rate. By selecting the best-scoring 80% of the alignments, the error rate is reduced from 4% to 0.7%. There were more errors on the English-French subcorpus than on the English-German subcorpus, showing that error rates will depend on the corpus considered; however, both were small enough to hope that the method will be useful for many language pairs.To further research on bilingual corpora, a much larger sample of Canadian Hansards (approximately 90 million words, half in English and and half in French) has been aligned with the align program and will be available through the Data Collection Initiative of the Association for Computational Linguistics (ACL/DCI). In addition, in order to facilitate replication of the align program, an appendix is provided with detailed c-code of the more difficult core of the align program.