Document clustering with universum

Authors:
Dan Zhang;Jingdong Wang;Luo Si
Affiliations:
Purdue University, West Lafayette, IN, USA;Microsoft Research Asia, Beijing, China;Purdue University, West Lafayette, IN, USA
Venue:
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Year:
2011

Citing 25
Cited 1

Combinatorial optimization: algorithms and complexity

Combinatorial optimization: algorithms and complexity
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Document clustering with cluster refinement and model selection capabilities

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Multiclass Spectral Clustering

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Convex Optimization

Convex Optimization
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Clustering for probabilistic model estimation for CF

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Inference with the Universum

ICML '06 Proceedings of the 23rd international conference on Machine learning
Near-duplicate detection by instance-level constrained clustering

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental hierarchical clustering of text documents

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Regularized clustering for documents

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient multiclass maximum margin clustering

Proceedings of the 25th international conference on Machine learning
Introduction to Information Retrieval

Introduction to Information Retrieval
Semi-supervised Learning from General Unlabeled Data

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Selecting hierarchical clustering cut points for web person-name disambiguation

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Interactive clustering of text collections according to a user-specified criterion

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Web page clustering using heuristic search in the web graph

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Selecting informative universum sample for semi-supervised learning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Bundle Methods for Regularized Risk Minimization

The Journal of Machine Learning Research
Result diversification based on query-specific cluster ranking

Journal of the American Society for Information Science and Technology

Information-theoretic term weighting schemes for document clustering

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document clustering is a popular research topic, which aims to partition documents into groups of similar objects (i.e., clusters), and has been widely used in many applications such as automatic topic extraction, document organization and filtering. As a recently proposed concept, Universum is a collection of "non-examples" that do not belong to any concept/cluster of interest. This paper proposes a novel document clustering technique -- Document Clustering with Universum, which utilizes the Universum examples to improve the clustering performance. The intuition is that the Universum examples can serve as supervised information and help improve the performance of clustering, since they are known not belonging to any meaningful concepts/clusters in the target domain. In particular, a maximum margin clustering method is proposed to model both target examples and Universum examples for clustering. An extensive set of experiments is conducted to demonstrate the effectiveness and efficiency of the proposed algorithm.