Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
A formal framework for linguistic annotation
Speech Communication - Special issue on speech annotation and corpus tools
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Parallelism in coordination as an instance of syntactic priming: evidence from corpus-based modeling
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Animacy encoding in English: why and how
DiscAnnotation '04 Proceedings of the 2004 ACL Workshop on Discourse Annotation
Learning information status of discourse entities
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
The AMI meeting corpus: a pre-announcement
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Supervised noun phrase coreference research: the first fifteen years
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Unsupervised syntactic chunking with acoustic cues: computational models for prosodic bootstrapping
CMCL '11 Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics
Syntax, semantics and pragmatics in communication
Proceedings of the 7th International Conference on Semantic Systems
Learning the information status of noun phrases in spoken dialogues
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning the fine-grained information status of discourse entities
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Automatic animacy classification
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
A bottom-up exploration of the dimensions of dialog state in spoken interaction
SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Automatically acquiring fine-grained information status distinctions in German
SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Hi-index | 0.00 |
This paper describes a recently completed common resource for the study of spoken discourse, the NXT-format Switchboard Corpus. Switchboard is a long-standing corpus of telephone conversations (Godfrey et al. in SWITCHBOARD: Telephone speech corpus for research and development. In Proceedings of ICASSP-92, pp. 517---520, 1992). We have brought together transcriptions with existing annotations for syntax, disfluency, speech acts, animacy, information status, coreference, and prosody; along with substantial new annotations of focus/contrast, more prosody, syllables and phones. The combined corpus uses the format of the NITE XML Toolkit, which allows these annotations to be browsed and searched as a coherent set (Carletta et al. in Lang Resour Eval J 39(4):313---334, 2005). The resulting corpus is a rich resource for the investigation of the linguistic features of dialogue and how they interact. As well as describing the corpus itself, we discuss our approach to overcoming issues involved in such a data integration project, relevant to both users of the corpus and others in the language resource community undertaking similar projects.