A statistical approach to machine translation
Computational Linguistics
Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
OCELOT: a system for summarizing Web pages
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Mining the web to create minority language corpora
Proceedings of the tenth international conference on Information and knowledge management
Comparing cross-language query expansion techniques by degrading translation resources
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Building Bilingual Dictionaries from Parallel Web Documents
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Knowledge Extraction from Bilingual Corpora
Information Extraction: Towards Scalable, Adaptable Systems
Building Parallel Corpora by Automatic Title Alignment
ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
Cross-Lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Research to Improve Cross-Language Retrieval - Position Paper for CLEF
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws
Journal of the American Society for Information Science and Technology
Automatic construction of English/Chinese parallel corpora
Journal of the American Society for Information Science and Technology
Character N-Gram Tokenization for European Language Text Retrieval
Information Retrieval
Introduction to the special issue on the web as corpus
Computational Linguistics - Special issue on web as corpus
Computational Linguistics - Special issue on web as corpus
Using the web to obtain frequencies for unseen bigrams
Computational Linguistics - Special issue on web as corpus
Automatic association of web directories with word senses
Computational Linguistics - Special issue on web as corpus
Natural Language Engineering
Word-for-word glossing with contextually similar words
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Translating unknown queries with web corpora for cross-language information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Resource selection for domain-specific cross-lingual IR
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Building parallel corpora by automatic title alignment using length-based and text-based approaches
Information Processing and Management: an International Journal
Building Minority Language Corpora by Learning to Generate Web Search Queries
Knowledge and Information Systems
Technical issues of cross-language information retrieval: a review
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Improved cross-language retrieval using backoff translation
HLT '01 Proceedings of the first international conference on Human language technology research
Bootstrapping bilingual data using consensus translation for a multilingual instant messaging system
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Organizing encyclopedic knowledge based on the web and its application to question answering
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
An unsupervised method for word sense tagging using parallel corpora
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Exploiting parallel texts for word sense disambiguation: an empirical study
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Corpus-based Learning of Analogies and Semantic Relations
Machine Learning
Natural Language Engineering
WWSM '00 Proceedings of the ACL-2000 workshop on Word senses and multi-linguality - Volume 8
Using the web as a bilingual dictionary
DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
From words to corpora: recognizing translation
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Using the web to overcome data sparseness
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Efficient optimization for bilingual sentence alignment based on linear regression
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Exploiting the Web as the multilingual corpus for unknown query translation
Journal of the American Society for Information Science and Technology
Automatic support for the alignment of multilingual Web sites: Research Articles
Journal of Software Maintenance and Evolution: Research and Practice
Building and Using a Lexical Knowledge Base of Near-Synonym Differences
Computational Linguistics
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A statistical model for near-synonym choice
ACM Transactions on Speech and Language Processing (TSLP)
Creating and exploiting a comparable corpus in cross-language information retrieval
ACM Transactions on Information Systems (TOIS)
Creating multilingual translation lexicons with regional variations using web corpora
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Automatic acquisition of English topic signatures based on a second language
ACLstudent '04 Proceedings of the ACL 2004 workshop on Student research
Word sense disambiguation using sense examples automatically acquired from a second language
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
WI-IATW '07 Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
Focused web crawling in the acquisition of comparable corpora
Information Retrieval
Automatic extraction of translations from web-based bilingual materials
Machine Translation
The web as a platform to build machine translation resources
Proceedings of the 2009 international workshop on Intercultural collaboration
Concept unification of terms in different languages via web mining for Information Retrieval
Information Processing and Management: an International Journal
WorkSense '00 Proceedings of the ACL-2000 Workshop on Word Senses and Multi-Linguality
Learning domain-specific information extraction patterns from the Web
IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document
ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
Creating a Persian-English comparable corpus
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Mining large-scale comparable corpora from Chinese-English news collections
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Automatic filtering of bilingual corpora for statistical machine translation
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Finding translations in scanned book collections
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Translation techniques in cross-language information retrieval
ACM Computing Surveys (CSUR)
Design of a hybrid high quality machine translation system
EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Rediscovering ACL discoveries through the lens of ACL anthology network citing sentences
ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Towards automatic assessment of government web sites
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
The ACL anthology network corpus
Language Resources and Evaluation
Hi-index | 0.00 |
STRAND (Resnik, 1998) is a language-independent system for automatic discovery of text in parallel translation on the World Wide Web. This paper extends the preliminary STRAND results by adding automatic language identification, scaling up by orders of magnitude, and formally evaluating performance. The most recent end-product is an automatically acquired parallel corpus comprising 2491 English-French document pairs, approximately 1.5 million words per language.