Collecting paraphrase corpora from volunteer contributors

Authors:
Timothy Chklovski
Affiliations:
University of Southern California, Marina del Rey, CA
Venue:
Proceedings of the 3rd international conference on Knowledge capture
Year:
2005

Citing 10
Cited 13

Toward conversational human-computer interaction

AI Magazine
Statistics-Based Summarization - Step One: Sentence Compression

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Learner: a system for acquiring commonsense knowledge by analogy

Proceedings of the 2nd international conference on Knowledge capture
Labeling images with a computer game

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Designing interfaces for guided collection of knowledge about everyday objects from volunteers

Proceedings of the 10th international conference on Intelligent user interfaces
Learning to paraphrase: an unsupervised approach using multiple-sequence alignment

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Building a sense tagged corpus with open mind word expert

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Common sense data acquisition for indoor mobile robots

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
An analysis of knowledge collected from volunteer contributors

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2

Improving the design of intelligent acquisition interfaces for collecting world knowledge from web contributors

Proceedings of the 3rd international conference on Knowledge capture
A competitive environment for exploratory query expansion

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Automating to-do lists for users: interpretation of to-dos for selecting and tasking agents

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Augmenting WordNet for deep understanding of text

STEP '08 Proceedings of the 2008 Conference on Semantics in Text Processing
Human computation: a survey and taxonomy of a growing field

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Web-based validation for contextual targeted paraphrasing

MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Morphological annotation of a corpus with a collaborative multiplayer game

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Capturing Common Knowledge about Tasks: Intelligent Assistance for To-Do Lists

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special Issue on Common Sense for Interactive Systems
Unsupervised data processing for classifier-based speech translator

Computer Speech and Language
Terminological paraphrase extraction from scientific literature based on predicate argument tuples

Journal of Information Science
Perspectives on crowdsourcing annotations for natural language processing

Language Resources and Evaluation
Phrase detectives: Utilizing collective intelligence for internet-scale language resource creation

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special section on internet-scale human problem solving and regular papers
Crowdsourced Knowledge Acquisition: Towards Hybrid-Genre Workflows

International Journal on Semantic Web & Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Extensive and deep paraphrase corpora are important for a variety of natural language processing and user interaction tasks. In this paper, we present an approach which i) collects multiple paraphrases per given item from volunteers and ii) incentivises responsible contributions by volunteer contributors. Our approach is to solicit paraphrases from Web volunteers, both collecting new paraphrases with no prompting and asking contributors to guess partially obfuscated paraphrases. To test the approach, we have implemented an online game, 1001 Paraphrases (http://ai-games.org/paraphrase.html), and deployed it to collect 20,944 entries focused on paraphrases of 400 statements. The approach complements existing text extraction methods and has some inherent unique advantages. We present and motivate our design as well as share preliminary observations and lessons learned about the performance of the approach.