BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Extending the BLEU MT evaluation method with frequency weightings
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Decomposability of translation metrics for improved evaluation and efficient algorithms
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Further meta-evaluation of machine translation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Feasibility of human-in-the-loop minimum error rate training
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Are your participants gaming the system?: screening mechanical turk workers
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Putting the crowd to work in a knowledge-based factory
Advanced Engineering Informatics
Learning more powerful test statistics for click-based retrieval evaluation
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Cheap, fast and good enough: automatic speech recognition with non-expert transcription
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Crowdsourcing the evaluation of a domain-adapted named entity recognition system
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Predicting human-targeted translation edit rate via untrained human annotators
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Some empirical evidence for annotation noise in a benchmarked dataset
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The best lexical metric for phrase-based statistical MT system optimization
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
BabelNet: building a very large multilingual semantic network
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Bridging SMT and TM with translation recommendation
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Emotions evoked by common words and phrases: using mechanical turk to create an emotion lexicon
CAAGET '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text
Creating speech and language data with Amazon's Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Clustering dictionary definitions using Amazon Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Consensus versus expertise: a case study of word alignment with Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Rating computer-generated questions with Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Shared task: crowdsourced accessibility elicitation of Wikipedia articles
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Document image collection using Amazon's Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Using Amazon Mechanical Turk for transcription of non-native speech
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Can crowds build parallel corpora for machine translation systems?
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Annotating large email datasets for named entity recognition with Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Annotating named entities in Twitter data with crowdsourcing
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
MTurk crowdsourcing: a viable method for rapid discovery of Arabic nicknames?
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Using Mechanical Turk to annotate lexicons for less commonly used languages
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Crowdsourcing and language studies: the new generation of linguistic data
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Collecting image annotations using Amazon's Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Non-expert evaluation of summarization systems is risky
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Preliminary experience with Amazon's Mechanical Turk for annotating medical named entities
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Amazon Mechanical Turk for subjectivity word sense disambiguation
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Using Mechanical Turk to build machine translation evaluation sets
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Rethinking grammatical error annotation and evaluation with the Amazon Mechanical Turk
IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
A semi-supervised word alignment algorithm with partial manual alignments
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Improving translation via targeted paraphrasing
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Social media for software engineering
Proceedings of the FSE/SDP workshop on Future of software engineering research
A crowdsourcing based mobile image translation and knowledge sharing service
Proceedings of the 9th International Conference on Mobile and Ubiquitous Multimedia
A data-driven case-based reasoning approach to interactive storytelling
ICIDS'10 Proceedings of the Third joint conference on Interactive digital storytelling
MT error detection for cross-lingual question answering
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Design and implementation of relevance assessments using crowdsourcing
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Collecting highly parallel data for paraphrase evaluation
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Crowdsourcing translation: professional quality from non-professionals
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
They can help: using crowdsourcing to improve the evaluation of grammatical error detection systems
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Repeatable and reliable search system evaluation using crowdsourcing
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Crowdsourcing word sense definition
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Colourful language: measuring word-colour associations
CMCL '11 Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics
Moving towards adaptive search in digital libraries
NLP4DL'09/AT4DL'09 Proceedings of the 2009 international conference on Advanced language technologies for digital libraries
Instrumenting the crowd: using implicit behavioral measures to predict task performance
Proceedings of the 24th annual ACM symposium on User interface software and technology
CrowdForge: crowdsourcing complex work
Proceedings of the 24th annual ACM symposium on User interface software and technology
Collaborative workflow for crowdsourcing translation
Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work
Active learning with Amazon Mechanical Turk
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Is Someone in this Office Available to Help Me?
Journal of Intelligent and Robotic Systems
Sentence-Level attachment prediction
IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
Profanity use in online communities
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Impacts of machine translation and speech synthesis on speech-to-speech translation
Speech Communication
Building subjectivity lexicon(s) from scratch for essay data
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Automatic identification of personal insults on social news sites
Journal of the American Society for Information Science and Technology
Effective temporal graph layout: a comparative study of animation versus static display methods
Information Visualization
Say Anything: Using Textual Case-Based Reasoning to Enable Open-Domain Interactive Storytelling
ACM Transactions on Interactive Intelligent Systems (TiiS) - Special Issue on Common Sense for Interactive Systems
Crowdsourcing research opportunities: lessons from natural language processing
Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
Using crowdsourcing for TREC relevance assessment
Information Processing and Management: an International Journal
CrowdScape: interactively visualizing user behavior and output
Proceedings of the 25th annual ACM symposium on User interface software and technology
CLex: a lexicon for exploring color, concept and emotion associations in language
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Multiplicity and word sense: evaluating and learning from multiply labeled word sense annotations
Language Resources and Evaluation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Extracting signed social networks from text
TextGraphs-7 '12 Workshop Proceedings of TextGraphs-7 on Graph-based Methods for Natural Language Processing
Findings of the 2012 workshop on statistical machine translation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Twitter translation using translation-based cross-lingual retrieval
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Concept-based indexing of annotated images using semantic DNA
Engineering Applications of Artificial Intelligence
Proceedings of the 2013 conference on Computer supported cooperative work
Patterns for visualization evaluation
Proceedings of the 2012 BELIV Workshop: Beyond Time and Errors - Novel Evaluation Methods for Visualization
How to filter out random clickers in a crowdsourcing-based study?
Proceedings of the 2012 BELIV Workshop: Beyond Time and Errors - Novel Evaluation Methods for Visualization
Perspectives on crowdsourcing annotations for natural language processing
Language Resources and Evaluation
Phrase detectives: Utilizing collective intelligence for internet-scale language resource creation
ACM Transactions on Interactive Intelligent Systems (TiiS) - Special section on internet-scale human problem solving and regular papers
An introduction to crowdsourcing for language and multimedia technology research
PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
Identifying top news using crowdsourcing
Information Retrieval
Implementing crowdsourcing-based relevance experimentation: an industrial perspective
Information Retrieval
Crowdsourcing performance evaluations of user interfaces
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The efficacy of human post-editing for language translation
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Using targeted paraphrasing and monolingual crowdsourcing to improve translation
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
Paraphrase acquisition via crowdsourcing and machine learning
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
News vertical search: when and what to display to users
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Managing distractions in complex settings
Proceedings of the 15th international conference on Human-computer interaction with mobile devices and services
An analysis of question quality and user performance in crowdsourced exams
Proceedings of the 2013 workshop on Data-driven user behavioral modelling and mining from social media
Age-Based task specialization for crowdsourced proofreading
UAHCI'13 Proceedings of the 7th international conference on Universal Access in Human-Computer Interaction: user and context diversity - Volume 2
Repeatable and reliable semantic search evaluation
Web Semantics: Science, Services and Agents on the World Wide Web
How beliefs about the presence of machine translation impact multilingual collaborations
Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
Crowdsourced Knowledge Acquisition: Towards Hybrid-Genre Workflows
International Journal on Semantic Web & Information Systems
Bucking the trend: improved evaluation and annotation practices for ESL error detection systems
Language Resources and Evaluation
Hi-index | 0.00 |
Manual evaluation of translation quality is generally thought to be excessively time consuming and expensive. We explore a fast and inexpensive way of doing it using Amazon's Mechanical Turk to pay small sums to a large number of non-expert annotators. For $10 we redundantly recreate judgments from a WMT08 translation task. We find that when combined non-expert judgments have a high-level of agreement with the existing gold-standard judgments of machine translation quality, and correlate more strongly with expert judgments than Bleu does. We go on to show that Mechanical Turk can be used to calculate human-mediated translation edit rate (HTER), to conduct reading comprehension experiments with machine translation, and to create high quality reference translations.