Some properties of preposition and subordinate conjunction attachments
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Information diffusion through blogspace
Proceedings of the 13th international conference on World Wide Web
A maximum entropy model for prepositional phrase attachment
HLT '94 Proceedings of the workshop on Human Language Technology
Understanding how bloggers feel: recognizing affect in blog posts
CHI '06 Extended Abstracts on Human Factors in Computing Systems
Discriminative syntactic language modeling for speech recognition
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A classification-based approach to question answering in discussion boards
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Non-projective parsing for statistical machine translation
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Synchronous tree adjoining machine translation
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Creating speech and language data with Amazon's Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Hi-index | 0.00 |
This paper explores the task of building an accurate prepositional phrase attachment corpus for new genres while avoiding a large investment in terms of time and money by crowd-sourcing judgments. We develop and present a system to extract prepositional phrases and their potential attachments from ungrammatical and informal sentences and pose the subsequent disambiguation tasks as multiple choice questions to workers from Amazon's Mechanical Turk service. Our analysis shows that this two-step approach is capable of producing reliable annotations on informal and potentially noisy blog text, and this semi-automated strategy holds promise for similar annotation projects in new genres.