Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Mining the Web: Discovering Knowledge from HyperText Data
Mining the Web: Discovering Knowledge from HyperText Data
Relationship-based clustering and cluster ensembles for high-dimensional data mining
Relationship-based clustering and cluster ensembles for high-dimensional data mining
Newsjunkie: providing personalized newsfeeds via analysis of information novelty
Proceedings of the 13th international conference on World Wide Web
WWW '05 Proceedings of the 14th international conference on World Wide Web
A clustering method for news articles retrieval system
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Hi-index | 0.00 |
The paper presents an autonomous text classification module for a news web portal for the Romanian language. Statistical natural language processing techniques are combined in order to achieve a completely autonomous functionality of the portal. The news items are automatically collected from a large number of news sources using web syndication. Afterward, machine-learning techniques are used for achieving an automatic classification of the news stream. Firstly, the items are clustered using an agglomerative algorithm and the resulting groups correspond to the main news topics. Thus, more information about each of the main topics is acquired from various news sources. Secondly, text classification algorithms are applied to automatically label each cluster of news items in a predetermined number of classes. More than a thousand news items were employed for both the training and the evaluation of the classifiers. The paper presents a complete comparison of the results obtained for each method.