Network flows: theory, algorithms, and applications
Network flows: theory, algorithms, and applications
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Models of translational equivalence among words
Computational Linguistics
Mining the Web for bilingual text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Fast decoding and optimal decoding for machine translation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
The Candide system for machine translation
HLT '94 Proceedings of the workshop on Human Language Technology
Computational Linguistics - Special issue on web as corpus
Bootstrapping parallel corpora
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
iSTART: paraphrase recognition
ACLstudent '04 Proceedings of the ACL 2004 workshop on Student research
Statistical machine translation
ACM Computing Surveys (CSUR)
A simple sentence-level extraction algorithm for comparable data
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Finding translations in scanned book collections
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
This paper presents a technique for discovering translationally equivalent texts. It is comprised of the application of a matching algorithm at two different levels of analysis and a well-founded similarity score. This approach can be applied to any multilingual corpus using any kind of translation lexicon; it is therefore adaptable to varying levels of multilingual resource availability. Experimental results are shown on two tasks: a search for matching thirty-word segments in a corpus where some segments are mutual translations, and classification of candidate pairs of web pages that may or may not be translations of each other. The latter results compare competitively with previous, document-structure-based approaches to the same problem.