Neural networks for pattern recognition
Neural networks for pattern recognition
Computational Linguistics - Special issue on web as corpus
Embedding web-based statistical translation models in cross-language information retrieval
Computational Linguistics - Special issue on web as corpus
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Methods and practical issues in evaluating alignment techniques
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic identification of word translations from unrelated English and German corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Aligning and using an English-Inuktitut parallel corpus
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Feature-based method for document alignment in comparable news corpora
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
A fast method for parallel document identification
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
New approach for collecting high quality parallel corpora from multilingual websites
Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
Hi-index | 0.00 |
Parallel corpora are playing a crucial role in multilingual natural language processing Unfortunately, the availability of such a resource is the bottleneck in most applications of interest Mining the web for parallel corpora is a viable solution that comes at a price: it is not always easy to identify parallel documents among the crawled material In this study we address the problem of automatically identifying the pairs of texts that are translation of each other in a set of documents We show that it is possible to automatically build particularly efficient content-based methods that make use of very little lexical knowledge We also evaluate our approach toward a front-end translation task and demonstrate that our parallel text classifier yields better performances than another approach based on a rich lexicon.