Bipartite graph partitioning and data clustering

Authors:
Hongyuan Zha;Xiaofeng He;Chris Ding;Horst Simon;Ming Gu
Affiliations:
Penn State Univ., State College, PA;Penn State Univ., State College, PA;Berkeley National Lab., Berkeley, CA;Berkeley National Lab., Berkeley, CA;U.C. Berkeley, Berkeley, CA
Venue:
Proceedings of the tenth international conference on Information and knowledge management
Year:
2001

Citing 8
Cited 71

An improved spectral bisection algorithm and its application to dynamic load balancing

Parallel Computing
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering using word clusters via the information bottleneck method

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Partitioning Rectangular and Structurally Unsymmetric Sparse Matrices for Parallel Processing

SIAM Journal on Scientific Computing
Finding out about: a cognitive perspective on search engine technology and the WWW

Finding out about: a cognitive perspective on search engine technology and the WWW
Principal Direction Divisive Partitioning

Data Mining and Knowledge Discovery
Normalized Cuts and Image Segmentation

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)

Correlating multilingual documents via bipartite graph modeling

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A matrix density based algorithm to hierarchically co-cluster documents and words

WWW '03 Proceedings of the 12th international conference on World Wide Web
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
Two supervised learning approaches for name disambiguation in author citations

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Document clustering by concept factorization

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Multi.Objective Hypergraph Partitioning Algorithms for Cut and Maximum Subdomain Degree Minimization

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Document clustering based on cluster validation

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Hierarchical Clustering Algorithms for Document Datasets

Data Mining and Knowledge Discovery
Name disambiguation in author citations using a K-way spectral clustering method

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Hierarchical Taxonomy Preparation for Text Categorization Using Consistent Bipartite Spectral Graph Copartitioning

IEEE Transactions on Knowledge and Data Engineering
Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Web image clustering by consistent utilization of visual features and surrounding texts

Proceedings of the 13th annual ACM international conference on Multimedia
Features for unsupervised document classification

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Spectral clustering for multi-type relational data

ICML '06 Proceedings of the 23rd international conference on Machine learning
Orthogonal nonnegative matrix t-factorizations for clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised learning on k-partite graphs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficiently clustering transactional data with weighted coverage density

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Language model-based document clustering using random walks

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Using cluster validation criterion to identify optimal feature subset and cluster number for document clustering

Information Processing and Management: an International Journal
Adaptive dimension reduction using discriminant analysis and K-means clustering

Proceedings of the 24th international conference on Machine learning
Relational clustering by symmetric convex coding

Proceedings of the 24th international conference on Machine learning
A probabilistic framework for relational clustering

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Bipartite isoperimetric graph partitioning for data co-clustering

Data Mining and Knowledge Discovery
Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering

Proceedings of the 17th international conference on World Wide Web
Towards effective document clustering: A constrained K-means based approach

Information Processing and Management: an International Journal
Knowledge transformation from word space to document space

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Real-time automatic tag recommendation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Community evolution in dynamic multi-mode networks

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A sparse gaussian processes classification framework for fast tag suggestions

Proceedings of the 17th ACM conference on Information and knowledge management
Quantify music artist similarity based on style and mood

Proceedings of the 10th ACM workshop on Web information and data management
Collaborative filtering using orthogonal nonnegative matrix tri-factorization

Information Processing and Management: an International Journal
Graph partitioning based on link distributions

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Query-URL bipartite based approach to personalized query recommendation

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Data clustering: principal components, Hopfield and self-aggregation networks

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
NP-hard and linear variants of hypergraph partitioning

Theoretical Computer Science
Two-way Poisson mixture models for simultaneous document classification and word clustering

Computational Statistics & Data Analysis
A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
SCALE: a scalable framework for efficiently clustering transactional data

Data Mining and Knowledge Discovery
Extracting shared topics of multiple documents

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
An effective multi-level algorithm based on ant colony optimization for bisecting graph

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Graph nodes clustering based on the commute-time kernel

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
An effective multi-level algorithm based on simulated annealing for bisecting graph

EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition
Community evolution detection in dynamic heterogeneous information networks

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Linear and quadratic programming approaches for the general graph partitioning problem

Journal of Global Optimization
An effective multilevel tabu search approach for balanced graph partitioning

Computers and Operations Research
Automatic tag recommendation algorithms for social recommender systems

ACM Transactions on the Web (TWEB)
Using a new relational concept to improve the clustering performance of search engines

Information Processing and Management: an International Journal
Integrating Document Clustering and Multidocument Summarization

ACM Transactions on Knowledge Discovery from Data (TKDD)
Simultaneous clustering: a survey

PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
Fourier-information duality in the identity management problem

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Improved approximation algorithms for bipartite correlation clustering

ESA'11 Proceedings of the 19th European conference on Algorithms
Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization

Proceedings of the 20th ACM international conference on Information and knowledge management
Non-negative matrix factorization based text mining: feature extraction and classification

ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II
Information marginalization on subgraphs

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
An information-theoretic framework for high-order co-clustering of heterogeneous objects

ECML'06 Proceedings of the 17th European conference on Machine Learning
An information-theoretic framework for process structure and data mining

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
An effective multi-level algorithm for bisecting graph

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Nonnegative Lagrangian relaxation of k-means and spectral clustering

ECML'05 Proceedings of the 16th European conference on Machine Learning
A non-negative matrix factorization based approach for active dual supervision from document and word labels

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Multi-way clustering and biclustering by the Ratio cut and Normalized cut in graphs

Journal of Combinatorial Optimization
Probabilistic approaches to tag recommendation in a social bookmarking network

PRIMA'10 Proceedings of the 13th international conference on Principles and Practice of Multi-Agent Systems
Partisan scale

Proceedings of the 21st international conference companion on World Wide Web
A fast and effective partitioning algorithm for document clustering

ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management
Partitioning signed bipartite graphs for classification of individuals and organizations

SBP'12 Proceedings of the 5th international conference on Social Computing, Behavioral-Cultural Modeling and Prediction
A sparsity-inducing formulation for evolutionary co-clustering

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Inferring novel associations between SNP sets and gene sets in eQTL study using sparse graphical model

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Text clustering using one-mode projection of document-word bipartite graphs

Proceedings of the 28th Annual ACM Symposium on Applied Computing
QUBiC: An adaptive approach to query-based recommendation

Journal of Intelligent Information Systems
Object-based visual query suggestion

Multimedia Tools and Applications
A Probabilistic Latent Semantic Analysis Model for Coclustering the Mouse Brain Atlas

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Tag recommendation for social bookmarking: Probabilistic approaches

Multiagent and Grid Systems - Principles and Practice of Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many data types arising from data mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers and purchasing items in market basket analysis and reviewers and movies in a movie recommender system. In this paper, we propose a new data clustering method based on partitioning the underlying bipartite graph. The partition is constructed by minimizing a normalized sum of edge weights between unmatched pairs of vertices of the bipartite graph. We show that an approximate solution to the minimization problem can be obtained by computing a partial singular value decomposition (SVD) of the associated edge weight matrix of the bipartite graph. We point out the connection of our clustering algorithm to correspondence analysis used in multivariate analysis. We also briefly discuss the issue of assigning data objects to multiple clusters. In the experimental results, we apply our clustering algorithm to the problem of document clustering to illustrate its effectiveness and efficiency.