A statistical approach to machine translation
Computational Linguistics
Identifying word correspondence in parallel texts
HLT '91 Proceedings of the workshop on Speech and Natural Language
Network flows: theory, algorithms, and applications
Network flows: theory, algorithms, and applications
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Models of translational equivalence among words
Computational Linguistics
Introduction to the special issue on computational linguistics using large corpora
Computational Linguistics - Special issue on using large corpora: I
Semi-automatic acquisition of domain-specific translation lexicons
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Mining the Web for bilingual text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Improved cross-language retrieval using backoff translation
HLT '01 Proceedings of the first international conference on Human language technology research
Inducing multilingual text analysis tools via robust projection across aligned corpora
HLT '01 Proceedings of the first international conference on Human language technology research
Inducing information extraction systems for new languages via cross-language projection
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
An unsupervised method for word sense tagging using parallel corpora
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Discriminative training and maximum entropy models for statistical machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Evaluating translational correspondence using annotation projection
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
Building a shallow Arabic Morphological Analyzer in one day
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
From words to corpora: recognizing translation
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Towards the self-annotating web
Proceedings of the 13th international conference on World Wide Web
ACM SIGKDD Explorations Newsletter
Text characteristics of English language university Web sites: Research Articles
Journal of the American Society for Information Science and Technology
Web-based models for natural language processing
ACM Transactions on Speech and Language Processing (TSLP)
Mining translations of OOV terms from the web through cross-lingual query expansion
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Natural Language Engineering
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Computational Linguistics
Stemming to improve translation lexicon creation form bitexts
Information Processing and Management: an International Journal
A study of statistical models for query translation: finding a good unit of translation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
ACM Transactions on Asian Language Information Processing (TALIP)
Orthographic Errors in Web Pages: Toward Cleaner Web Corpora
Computational Linguistics
Named entity translation matching and learning: With application for mining unseen translations
ACM Transactions on Information Systems (TOIS)
Statistical machine translation with word- and sentence-aligned parallel corpora
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
An automatic filter for non-parallel texts
ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Extracting parallel sub-sentential fragments from non-parallel corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A DOM tree alignment model for mining parallel data from the web
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Novel association measures using web search with double checking
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Multi-level bootstrapping for extracting parallel sentences from a quasi-comparable corpus
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Mining key phrase translations from web corpora
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Improved statistical machine translation using paraphrases
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Is it correct?: towards web-based evaluation of automatic natural language phrase generation
COLING-ACL '06 Proceedings of the COLING/ACL on Interactive presentation sessions
The linguist's search engine: an overview
ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Statistical query translation models for cross-language information retrieval
ACM Transactions on Asian Language Information Processing (TALIP)
Measuring semantic similarity between words using web search engines
Proceedings of the 16th international conference on World Wide Web
Sentence alignment using P-NNT and GMM
Computer Speech and Language
WI-IATW '07 Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
Pattern-based automatic taxonomy learning from the Web
AI Communications
Statistical machine translation
ACM Computing Surveys (CSUR)
Integrating Cross-Language Hierarchies and Its Application to Retrieving Relevant Documents
ACM Transactions on Asian Language Information Processing (TALIP)
Quantitative comparisons of search engine results
Journal of the American Society for Information Science and Technology
AEON - An approach to the automatic evaluation of ontologies
Applied Ontology - Ontological Foundations of Conceptual Modelling
Pivot language approach for phrase-based statistical machine translation
Machine Translation
Automatic extraction of translations from web-based bilingual materials
Machine Translation
Word sense disambiguation: A survey
ACM Computing Surveys (CSUR)
Mapping geographic coverage of the web
Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
Advanced Information Retrieval
Electronic Notes in Theoretical Computer Science (ENTCS)
Query Classification and Expansion for Translation Mining Via Search Engines
PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Zero-Anaphora Resolution in Chinese Using Maximum Entropy
IEICE - Transactions on Information and Systems
Translating medical terminologies through word alignment in parallel text corpora
Journal of Biomedical Informatics
The SAWA corpus: a parallel corpus English - Swahili
AfLaT '09 Proceedings of the First Workshop on Language Technologies for African Languages
Improving the extraction of bilingual terminology from Wikipedia
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Retrieving bilingual verb-noun collocations by integrating cross-language category hierarchies
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
On the use of comparable corpora to improve SMT performance
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Text data acquisition for domain-specific language models
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
A fast and accurate method for detecting English-Japanese parallel texts
MLRI '06 Proceedings of the Workshop on Multilingual Language Resources and Interoperability
Improved sentence alignment on parallel web pages using a stochastic tree alignment model
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Language and translation model adaptation using comparable corpora
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Selecting relevant text subsets from web-data for building topic specific language models
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
A fast method for parallel document identification
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Mining translations of web queries from web click-through data
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
A simple sentence-level extraction algorithm for comparable data
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
SemEval-2007 task 11: English lexical sample task via English-Chinese parallel text
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Pseudo-aligned multilingual corpora
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Mining a comparable text corpus for a Vietnamese - French statistical machine translation system
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Automatically learning qualia structures from the web
DeepLA '05 Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition
Frontiers in linguistic annotation for lower-density languages
LAC '06 Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006
A beam-search extraction algorithm for comparable data
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Mining bilingual data from the web with adaptively learnt patterns
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Train the machine with what it can learn: corpus selection for SMT
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Exploiting comparable corpora with TER and TERp
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Improved statistical machine translation using monolingually-derived paraphrases
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Discriminative corpus weight estimation for machine translation
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Constructing a large scale text corpus based on the grid and trustworthiness
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
QRselect: a user-driven system for collecting translation document pairs from the web
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
A refinement framework for cross language text categorization
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Learning medical ontologies from the web
AIME'07 Proceedings of the 2007 conference on Knowledge management for health care procedures
Unsupervised translation disambiguation based on maximum web bilingual relatedness: web as lexicon
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Extracting parallel sentences from comparable corpora using document level alignment
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Extracting sense-disambiguated example sentences from parallel corpora
WDE '09 Proceedings of the 1st Workshop on Definition Extraction
Extracting parallel fragments from comparable corpora for data-to-text generation
INLG '10 Proceedings of the 6th International Natural Language Generation Conference
An empirical study on web mining of parallel data
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Large scale parallel document mining for machine translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A kernel regression framework for SMT
Machine Translation
A novel method for bilingual web page acquisition from search engine web records
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Automatic extraction of acronym definitions from the Web
Applied Intelligence
Resources for Turkish morphological processing
Language Resources and Evaluation
Crowdsourcing translation: professional quality from non-professionals
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Bilingual lexicon extraction from comparable corpora enhanced with parallel corpora
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Two ways to use a noisy parallel news corpus for improving statistical machine translation
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Extracting parallel phrases from comparable data
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Active learning with multiple annotations for comparable data classification task
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Unsupervised alignment of comparable data and text resources
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Building a web-based parallel corpus and filtering out machine-translated text
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Exploring the sawa corpus: collection and deployment of a parallel corpus English--Swahili
Language Resources and Evaluation
International Journal of Information and Communication Technology
Parallel sentence generation from comparable corpora for improved SMT
Machine Translation
Automatic evaluation of ontologies (AEON)
ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Extracting english-korean transliteration pairs from web corpora
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
WISE'05 Proceedings of the 2005 international conference on Web Information Systems Engineering
Construct trilingual parallel corpus on demand
ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
A minimally supervised approach for detecting and ranking document translation pairs
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
CEU-UPV English-Spanish system for WMT11
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatic identification of parallel documents with light or without linguistic resources
AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Mining parenthetical translations for polish-english lexica
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Automatic acquisition of chinese–english parallel corpus from the web
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Enabling users to create their own web-based machine translation engine
Proceedings of the 21st international conference companion on World Wide Web
Extracting parallel paragraphs and sentences from english-persian translated documents
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
A framework for semantic discovery of web services
iUBICOM'10 Proceedings of the 5th international conference on Ubiquitous and Collaborative Computing
Measuring semantic similarity between words by removing noise and redundancy in web snippets
Concurrency and Computation: Practice & Experience
Finding translations in scanned book collections
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
AEON - An approach to the automatic evaluation of ontologies
Applied Ontology - Ontological Foundations of Conceptual Modelling
Translation techniques in cross-language information retrieval
ACM Computing Surveys (CSUR)
Transliteration mining using large training and test sets
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Context similarity measure using Fuzzy Formal Concept Analysis
Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
Zero anaphora resolution in chinese and its application in chinese-english machine translation
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Collaboratively built semi-structured content and Artificial Intelligence: The story so far
Artificial Intelligence
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Manifold alignment preserving global geometry
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Parallel corpora have become an essential resource for work in multilingual natural language processing. In this article, we report on our work using the STRAND system for mining parallel text on the World Wide Web,first reviewing the original algorithm and results and then presenting a set of significant enhancements. These enhancements include the use of supervised learning based on structural features of documents to improve classification performance, a new content-based measure of translational equivalence, and adaptation of the system to take advantage of the Internet Archive for mining parallel text from the Web on a large scale. Finally, the value of these techniques is demonstrated in the construction of a significant parallel corpus for a low-density language pair.