Document clustering via adaptive subspace iteration

Authors:
Tao Li;Sheng Ma;Mitsunori Ogihara
Affiliations:
University of Rochester, Rochester, NY;IBM T.J. Watson Research Center, Hawthorne, NY;University of Rochester, Rochester, NY
Venue:
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2004

Citing 26
Cited 35

Algorithms for clustering data

Algorithms for clustering data
Applied multivariate statistical analysis

Applied multivariate statistical analysis
Elements of information theory

Elements of information theory
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
WebACE: a Web agent for document categorization and exploration

AGENTS '98 Proceedings of the second international conference on Autonomous agents
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Document Categorization and Query Generation on the World Wide WebUsing WebACE

Artificial Intelligence Review - Special issue on data mining on the Internet
Document clustering using word clusters via the information bottleneck method

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data mining: concepts and techniques

Data mining: concepts and techniques
Clustering by Scale-Space Filtering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Algorithms

Clustering Algorithms
Locally Adaptive Metric Nearest-Neighbor Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Factorization Approach to Grouping

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume I - Volume I
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
CoFD: An Algorithm for Non-distance Based Clustering in High Dimensional Spaces

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Adaptive dimension reduction for clustering high dimensional data

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Segmentation Using Eigenvectors: A Unifying View

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Spectral partitioning works: planar graphs and finite element meshes

FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning

Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning
Sufficient dimensionality reduction

The Journal of Machine Learning Research
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient multi-way text categorization via generalized discriminant analysis

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management

On combining multiple clusterings

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Document Clustering Using Locality Preserving Indexing

IEEE Transactions on Knowledge and Data Engineering
A Unified View on Clustering Binary Data

Machine Learning
A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
NMF and PLSI: equivalence and a hybrid algorithm

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Orthogonal nonnegative matrix t-factorizations for clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Using cluster validation criterion to identify optimal feature subset and cluster number for document clustering

Information Processing and Management: an International Journal
Adaptive dimension reduction using discriminant analysis and K-means clustering

Proceedings of the 24th international conference on Machine learning
Regularized clustering for documents

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Possibilistic fuzzy co-clustering of large document collections

Pattern Recognition
Biomedical ontology improves biomedical literature clustering performance: a comparison study

International Journal of Bioinformatics Research and Applications
Meta methods for model sharing in personal information systems

ACM Transactions on Information Systems (TOIS)
Similarity Measurement of XML Documents Based on Structure and Contents

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
Document Clustering Based on Spectral Clustering and Non-negative Matrix Factorization

IEA/AIE '08 Proceedings of the 21st international conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: New Frontiers in Applied Artificial Intelligence
Comparing Non-parametric Ensemble Methods for Document Clustering

NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Clustering based on matrix approximation: a unifying view

Knowledge and Information Systems
Clustering multi-way data via adaptive subspace iteration

Proceedings of the 17th ACM conference on Information and knowledge management
Fuzzy-Adaptive-Subspace-Iteration-Based Two-Way Clustering of Microarray Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Dynamicity vs. effectiveness: studying online clustering for scatter/gather

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
K-Subspace Clustering

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Subspace maximum margin clustering

Proceedings of the 18th ACM conference on Information and knowledge management
Spectral embedded clustering

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Document Clustering with Cluster Refinement and Non-negative Matrix Factorization

ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
Normalized dimensionality reduction using nonnegative matrix factorization

Neurocomputing
On combining multiple clusterings: an overview and a new perspective

Applied Intelligence
Document clustering using NMF and fuzzy relation

Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
Integrating Document Clustering and Multidocument Summarization

ACM Transactions on Knowledge Discovery from Data (TKDD)
Image analysis with nonlinear adaptive dimension reduction

Proceedings of the Third International Conference on Internet Multimedia Computing and Service
Representing document as dependency graph for document clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
A probabilistic clustering-projection model for discrete data

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Dimensionality reduction by Mixed Kernel Canonical Correlation Analysis

Pattern Recognition
Regularized nonnegative shared subspace learning

Data Mining and Knowledge Discovery
A survey on enhanced subspace clustering

Data Mining and Knowledge Discovery
Learning a subspace for clustering via pattern shrinking

Information Processing and Management: an International Journal
Tensor clustering via adaptive subspace iteration

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document clustering has long been an important problem in information retrieval. In this paper, we present a new clustering algorithm ASI1 , which uses explicitly modeling of the subspace structure associated with each cluster. ASI simultaneously performs data reduction and subspace identification via an iterative alternating optimization procedure. Motivated from the optimization procedure, we then provide a novel method to determine the number of clusters. We also discuss the connections of ASI with various existential clustering approaches. Finally, extensive experimental results on real data sets show the effectiveness of ASI algorithm.