Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Introduction to Information Retrieval
Introduction to Information Retrieval
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Robust statistic estimates for adaptation in the task of speech recognition
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Web text data mining for building large scale language modelling corpus
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Web text data mining for building large scale language modelling corpus
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Hi-index | 0.00 |
The paper presents a module for topic identification that is embedded into a complex system for acquisition and storing large volumes of text data from the Web. The module processes each of the acquired data items and assigns keywords to them from a defined topic hierarchy that was developed for this purposes and is also described in the paper. The quality of the topic identification is evaluated in two ways - using classic precision-recall measures and also indirectly, by measuring the ASR performance of the topic-specific language models that are built using the automatically filtered data.