Indexing confusion networks for morph-based spoken document retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval on Turkish texts
Journal of the American Society for Information Science and Technology
Turkish Broadcast News Transcription and Retrieval
IEEE Transactions on Audio, Speech, and Language Processing
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.00 |
Speech Retrieval systems utilize automatic speech recognition (ASR) to generate textual data for indexing. However, automatic transcriptions include errors, either because of out-of-vocabulary (OOV) words or due to ASR inaccuracy. In this work, we address spoken information retrieval in Turkish, a morphologically rich language where OOV rates are high. We apply several techniques, such as using subword units and indexing alternative hypotheses, to cope with the OOV problem and ASR inaccuracy. Experiments are performed on our Turkish Broadcast News (BN) Corpus which also incorporates a spoken IR collection. Results indicate that word segmentation is quite useful but the efficiency of indexing alternative hypotheses depends on retrieval type.