Pseudo-Supervised Clustering for Text Documents

Authors:
M. Maggini;L. Rigutini;M. Turchi
Affiliations:
Università di Siena, Italy;Università di Siena, Italy;Università di Siena, Italy
Venue:
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Year:
2004

Citing 8
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Clustering algorithms

Information retrieval
Elements of information theory

Elements of information theory
Fast supervised dimensionality reduction algorithm with applications to document categorization & retrieval

Proceedings of the ninth international conference on Information and knowledge management
A vector space model for automatic indexing

Communications of the ACM
Concept decompositions for large sparse text data using clustering

Machine Learning
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
An Algorithmic Theory of Learning: Robust Concepts and Random Projection

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Effective solutions for Web search engines can take advantage of algorithms for the automatic organization of documents into homogeneous clusters. Unfortunately, document clustering is not an easy task especially when the documents share a common set of topics, like in vertical search engines. In this paper we propose two clustering algorithms which can be tuned by the feedback of an expert. The feedback is used to choose an appropriate basis for the representation of documents, while the clustering is performed in the projected space. The algorithms are evaluated on a dataset containing papers from computer science conferences. The results show that an appropriate choice of the representation basis can yield better performance with respect to the original vector space model.