Supervised collaboration for syntactic annotation of Quranic Arabic

Authors:
Kais Dukes;Eric Atwell;Nizar Habash
Affiliations:
University of Leeds, Leeds, UK;University of Leeds, Leeds, UK;Columbia University, New York, USA
Venue:
Language Resources and Evaluation
Year:
2013

Citing 12
Cited 1

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Maximum entropy based restoration of Arabic diacritics

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Internet-scale collection of human-reviewed data

Proceedings of the 16th international conference on World Wide Web
Geographical analysis of hierarchical business structures by interactive drill down

Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
Arabic Natural Language Processing

Arabic Natural Language Processing
Arabic Computational Morphology: Knowledge-based and Empirical Methods

Arabic Computational Morphology: Knowledge-based and Empirical Methods
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Arabic diacritization through full morphological tagging

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
CATiB: the Columbia Arabic Treebank

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Constructing an anaphorically annotated corpus with non-experts: assessing the quality of collaborative annotations

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Beyond Wikipedia: coordination and conflict in online production groups

Proceedings of the 2010 ACM conference on Computer supported cooperative work
How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation

Proceedings of the international conference on Multimedia information retrieval

One-step statistical parsing of hybrid dependency-constituency syntactic representations

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Quranic Arabic Corpus ( http://corpus.quran.com ) is a collaboratively constructed linguistic resource initiated at the University of Leeds, with multiple layers of annotation including part-of-speech tagging, morphological segmentation (Dukes and Habash 2010) and syntactic analysis using dependency grammar (Dukes and Buckwalter 2010). The motivation behind this work is to produce a resource that enables further analysis of the Quran, the 1,400 year-old central religious text of Islam. This project contrasts with other Arabic treebanks by providing a deep linguistic model based on the historical traditional grammar known as i驴r驴b (驴驴驴驴驴). By adapting this well-known canon of Quranic grammar into a familiar tagset, it is possible to encourage online annotation by Arabic linguists and Quranic experts. This article presents a new approach to linguistic annotation of an Arabic corpus: online supervised collaboration using a multi-stage approach. The different stages include automatic rule-based tagging, initial manual verification, and online supervised collaborative proofreading. A popular website attracting thousands of visitors per day, the Quranic Arabic Corpus has approximately 100 unpaid volunteer annotators each suggesting corrections to existing linguistic tagging. To ensure a high-quality resource, a small number of expert annotators are promoted to a supervisory role, allowing them to review or veto suggestions made by other collaborators. The Quran also benefits from a large body of existing historical grammatical analysis, which may be leveraged during this review. In this paper we evaluate and report on the effectiveness of the chosen annotation methodology. We also discuss the unique challenges of annotating Quranic Arabic online and describe the custom linguistic software used to aid collaborative annotation.