The manually annotated sub-corpus: a community resource for and by the people

Authors:
Nancy Ide;Christiane Fellbaum;Collin Baker;Rebecca Passonneau
Affiliations:
Vassar College, Poughkeepsie, NY;Princeton University, Princeton, New Jersey;International Computer Science Institute, Berkeley, California;Columbia University, New York
Venue:
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Year:
2010

Citing 8
Cited 11

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
International standard for a linguistic annotation framework

Natural Language Engineering
UIMA: an architectural approach to unstructured information processing in the corporate research environment

Natural Language Engineering
OntoNotes: A Unified Relational Semantic Representation

ICSC '07 Proceedings of the International Conference on Semantic Computing
Making sense of word sense variation

DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
GrAF: a graph-based format for linguistic annotations

LAW '07 Proceedings of the Linguistic Annotation Workshop
WordNet and FrameNet as complementary resources for annotation

ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Natural Language Processing with Python

Natural Language Processing with Python

Anveshan: a framework for analysis of multiple annotators' labeling behavior

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
A collaborative annotation between human annotators and a statistical parser

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Towards generating text from discourse representation structures

ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
Bridging the gaps: interoperability for language engineering architectures using GrAF

Language Resources and Evaluation
POWLA: modeling linguistic corpora in OWL/DL

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
A platform for collaborative semantic annotation

EACL '12 Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Multiplicity and word sense: evaluating and learning from multiply labeled word sense annotations

Language Resources and Evaluation
FrameNet, current collaborations and future goals

Language Resources and Evaluation
A model for linguistic resource description

LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop
CSAF: a community-sourcing annotation framework

LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop
Collective intelligence and language resources: introduction to the special issue on collaboratively constructed language resources

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Manually Annotated Sub-Corpus (MASC) project provides data and annotations to serve as the base for a communitywide annotation effort of a subset of the American National Corpus. The MASC infrastructure enables the incorporation of contributed annotations into a single, usable format that can then be analyzed as it is or ported to any of a variety of other formats. MASC includes data from a much wider variety of genres than existing multiply-annotated corpora of English, and the project is committed to a fully open model of distribution, without restriction, for all data and annotations produced or contributed. As such, MASC is the first large-scale, open, community-based effort to create much needed language resources for NLP. This paper describes the MASC project, its corpus and annotations, and serves as a call for contributions of data and annotations from the language processing community.