Combinatorial optimization: algorithms and complexity
Combinatorial optimization: algorithms and complexity
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Document clustering with cluster refinement and model selection capabilities
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Multiclass Spectral Clustering
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Convex Optimization
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Clustering for probabilistic model estimation for CF
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
ICML '06 Proceedings of the 23rd international conference on Machine learning
Near-duplicate detection by instance-level constrained clustering
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental hierarchical clustering of text documents
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Regularized clustering for documents
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient multiclass maximum margin clustering
Proceedings of the 25th international conference on Machine learning
Introduction to Information Retrieval
Introduction to Information Retrieval
Semi-supervised Learning from General Unlabeled Data
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Selecting hierarchical clustering cut points for web person-name disambiguation
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Interactive clustering of text collections according to a user-specified criterion
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Web page clustering using heuristic search in the web graph
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Selecting informative universum sample for semi-supervised learning
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Bundle Methods for Regularized Risk Minimization
The Journal of Machine Learning Research
Result diversification based on query-specific cluster ranking
Journal of the American Society for Information Science and Technology
Information-theoretic term weighting schemes for document clustering
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
Document clustering is a popular research topic, which aims to partition documents into groups of similar objects (i.e., clusters), and has been widely used in many applications such as automatic topic extraction, document organization and filtering. As a recently proposed concept, Universum is a collection of "non-examples" that do not belong to any concept/cluster of interest. This paper proposes a novel document clustering technique -- Document Clustering with Universum, which utilizes the Universum examples to improve the clustering performance. The intuition is that the Universum examples can serve as supervised information and help improve the performance of clustering, since they are known not belonging to any meaningful concepts/clusters in the target domain. In particular, a maximum margin clustering method is proposed to model both target examples and Universum examples for clustering. An extensive set of experiments is conducted to demonstrate the effectiveness and efficiency of the proposed algorithm.