Fast and Accurate Sentence Alignment of Bilingual Corpora
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Adaptive Parallel Sentences Mining from Web Bilingual News Collection
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Computational Linguistics - Special issue on web as corpus
A program for aligning sentences in bilingual corpora
Computational Linguistics - Special issue on using large corpora: I
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora
Computational Linguistics
Automatic construction of parallel English-Chinese corpus for cross-language information retrieval
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Discovering parallel text from the World Wide Web
ACSW Frontiers '04 Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32
Aligning sentences in parallel corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Aligning sentences in bilingual corpora using lexical information
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A syntax-based statistical translation model
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Loosely tree-based alignment for machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Comparison, selection and use of sentence alignment algorithms for new language pairs
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Extracting parallel paragraphs and sentences from english-persian translated documents
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Hi-index | 0.00 |
Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages. Parallel web pages tend to have parallel structures, and the structural correspondence can be indicative information for identifying parallel sentences. In our approach, the web page is represented as a tree, and a stochastic tree alignment model is used to exploit the structural correspondence for sentence alignment. Experiments show that this method significantly enhances alignment accuracy and robustness for parallel web pages which are much more diverse and noisy than standard parallel corpora such as "Hansard". With improved sentence alignment performance, web mining systems are able to acquire parallel sentences of higher quality from the web.