Digitisation and automatic alignment of the dialog corpus: a prosodically annotated corpus of Czech television debates

  • Authors:
  • Nino Peterek;Petr Kaderka;Zdeňka Svobodová;Eva Havlová;Martin Havlík;Jana Klímová;Patricie Kubáčková

  • Affiliations:
  • Charles University, MFF, Prague and Institute of Formal and Applied Linguistics, ÚFAL;The Academy of Sciences of the Czech Republic, Czech Language Institute, ÚJČ;The Academy of Sciences of the Czech Republic, Czech Language Institute, ÚJČ;The Academy of Sciences of the Czech Republic, Czech Language Institute, ÚJČ;The Academy of Sciences of the Czech Republic, Czech Language Institute, ÚJČ;The Academy of Sciences of the Czech Republic, Czech Language Institute, ÚJČ;The Academy of Sciences of the Czech Republic, Czech Language Institute, ÚJČ

  • Venue:
  • TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article describes the development and automatic processing of the audio-visual DIALOG corpus. The DIALOG corpus is a prosodically annotated corpus of Czech television debates that has been recorded and annotated at the Czech Language Institute of the Academy of Sciences of the Czech Republic. It has recently grown to more than 400 VHS 4-hour tapes and 375 transcribed TV debates. The described digitisation process and automatic alignment enable an easily accessible and user-friendly research environment, supporting the exploration of Czech prosody and its analysis and modelling. This project has been carried out in cooperation with the Institute of Formal and Applied Linguistics of Faculty of Mathematics and Physics, Charles University, Prague. Currently the first version of the DIALOG corpus is available to the public (version 0.1, http://ujc.dialogy.cz). It includes 10 selected and revised hour-long talk shows.