Compilation, transcription and usage of a reference speech corpus: the case of the Slovene corpus GOS

Authors:
Darinka Verdonik;Iztok Kosem;Ana Zwitter Vitez;Simon Krek;Marko Stabej
Affiliations:
University of Maribor, Maribor, Slovenia;Trojina, Institute for Applied Slovene Studies, Škofja Loka, Slovenia;Trojina, Institute for Applied Slovene Studies, Škofja Loka, Slovenia;Amebis, d.o.o., Kamnik, Slovenia;University of Ljubljana, Ljubljana, Slovenia
Venue:
Language Resources and Evaluation
Year:
2013

Citing 3
Cited 0

Transcriber: Development and use of a tool for assisting speech corpora production

Speech Communication - Special issue on speech annotation and corpus tools
Large vocabulary continuous speech recognition of an inflected language using stems and endings

Speech Communication
Slovenian spontaneous speech recognition and acoustic modeling of filled pauses and onomatopoeas

WSEAS Transactions on Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, building reference speech corpora was an important part of the activities which provided the necessary linguistic infrastructure in many European countries, for languages with many speakers (e.g., French, German, Spanish, Italian) as well as for those with smaller numbers of speakers (e.g., Swedish, Dutch, Czech, Slovak). This paper describes the process of the creation of a reference speech corpus and its distribution to potential users, as it was done in the case of the Slovene corpus GOS. The corpus structure and fieldwork experiences with recording, labelling system, and two levels of transcription (pronunciation-based and standardized) are described, as well as the main characteristics of the corpus interface (web concordancer) and the availability of the original corpus files.