Document Indexing With a Concept Hierarchy
NDDL '01 Proceedings of the 1st International Workshop on New Developments in Digital Libraries: n conjunction with ICEIS 2001
Speaker role recognition to help spontaneous conversational speech detection
Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
Hi-index | 0.00 |
This paper describes a topic segmentation and indexation system for TV broadcast news programs spoken in European Portuguese. The system is integrated in an alert system for selective dissemination of multimedia information developed in the scope of an European Project. The goal of this work is to enhance the retrieval of specific spoken documents that have been automatically transcribed, using speech recognition. Our segmentation algorithm is based on simple heuristics related with anchor detection. The indexation is based on hierarchical concept trees (thesaurus), containing 22 main thematic domains, for which Hidden Markov models and topic language models were created. On-going experiments related to multiple topic indexing are also described, where a confidence measure based on the likelihood ratio test is used as the hypothesis test.