Text mining: classification & clustering of articles related to sports

Authors:
Ritu Arora;Purushotham Bangalore
Affiliations:
University of Alabama at Birmingham, Birmingham, AL;University of Alabama at Birmingham, Birmingham, AL
Venue:
Proceedings of the 43rd annual Southeast regional conference - Volume 1
Year:
2005

Citing 1
Cited 0

TopCat: Data Mining for Topic Identification in a Text Corpus

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identification of articles related to a particular domain is addressed by Text Mining. This paper demonstrates the benefits of combining classification and clustering towards achieving the goal of grouping very closely related articles/documents. Classification chaffs out the articles which do not belong to the domain of interest and clustering forms subgroups between the classified articles. Combining classification and clustering results in tight and accurate clusters. The text mining program developed for this project automates the grouping of related news articles. This feature can be exploited by News organizations interested in accessing related documents with minimum effort. Although, the articles/documents can be from any source but they must be in text format for the automated program to work on it. This project is also a verification of the 'cluster hypothesis' which is explained below.