Improving text classification for oral history archives with temporal domain knowledge
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Spoken Content Retrieval: A Survey of Techniques and Technologies
Foundations and Trends in Information Retrieval
Hi-index | 0.00 |
The task addressed by this paper is spoken document classification (SDC) of German TV news with Support Vector Machines (SVMs). It shows the benefits of weighting different linguistic units when combined into one feature vector. Further experiments show that probabilistic SVMs (pSVMs) with recently introduced couplers perform well on a SDC task. New couplers for multi-category classification, both for pSVMs and non-pSVMs, will be discussed. They are easy to implement and show good and promising results. It turns out that using the distance instead of the decision value can be favorable. Theoretical justification is given for our approaches, and some results are explained theoretically.