Arithmetic coding for data compression
Communications of the ACM
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Latent semantic indexing: a probabilistic analysis
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A semidiscrete matrix decomposition for latent semantic indexing information retrieval
ACM Transactions on Information Systems (TOIS)
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Small worlds: the dynamics of networks between order and randomness
Small worlds: the dynamics of networks between order and randomness
Data mining: concepts and techniques
Data mining: concepts and techniques
Multilevel algorithms for multi-constraint graph partitioning
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Concept Decompositions for Large Sparse Text Data Using Clustering
Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Multivariate Information Bottleneck
UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Handling very large numbers of association rules in the analysis of microarray data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An Approach to Relate the Web Communities through Bipartite Graphs
WISE '01 Proceedings of the Second International Conference on Web Information Systems Engineering (WISE'01) Volume 1 - Volume 1
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
Data Mining and Knowledge Discovery
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining multiple phenotype structures underlying gene expression profiles
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
BLINC: multilevel traffic classification in the dark
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Parameter-Free Spatial Data Mining Using MDL
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Neighborhood Formation and Anomaly Detection in Bipartite Graphs
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Relevance search and anomaly detection in bipartite graphs
ACM SIGKDD Explorations Newsletter
Graph mining: Laws, generators, and algorithms
ACM Computing Surveys (CSUR)
Robust information-theoretic clustering
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
LinkClus: efficient clustering via heterogeneous semantic links
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficiently clustering transactional data with weighted coverage density
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Compression-based data mining of sequential data
Data Mining and Knowledge Discovery
Trajectory clustering: a partition-and-group framework
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Predictive discrete latent factor models for large scale dyadic data
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
GraphScope: parameter-free mining of large time-evolving graphs
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
RIC: Parameter-free noise-robust clustering
ACM Transactions on Knowledge Discovery from Data (TKDD)
Discovering global network communities based on local centralities
ACM Transactions on the Web (TWEB)
CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Hierarchical, Parameter-Free Community Discovery
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Determining the best K for clustering transactional datasets: A coverage density-based approach
Data & Knowledge Engineering
Information Theoretic Comparison of Stochastic Graph Models: Some Experiments
WAW '09 Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph
Estimating the number of frequent itemsets in a large database
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Applying latent dirichlet allocation to group discovery in large graphs
Proceedings of the 2009 ACM symposium on Applied Computing
Automatic discovery of botnet communities on large-scale communication networks
Proceedings of the 4th International Symposium on Information, Computer, and Communications Security
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
“Best K”: critical clustering structures in categorical datasets
Knowledge and Information Systems
HE-Tree: a framework for detecting changes in clustering structure for categorical data streams
The VLDB Journal — The International Journal on Very Large Data Bases
SCALE: a scalable framework for efficiently clustering transactional data
Data Mining and Knowledge Discovery
A fast and compact web graph representation
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Browsing an image database utilizing the associations between images and features
ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Predictive blacklisting as an implicit recommendation system
INFOCOM'10 Proceedings of the 29th conference on Information communications
Metric forensics: a multi-level approach for mining volatile graphs
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Unifying dependent clustering and disparate clustering for non-homogeneous data
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast and Compact Web Graph Representations
ACM Transactions on the Web (TWEB)
Profiling users in a 3g network using hourglass co-clustering
Proceedings of the sixteenth annual international conference on Mobile computing and networking
The impact of unlinkability on adversarial community detection: effects and countermeasures
PETS'10 Proceedings of the 10th international conference on Privacy enhancing technologies
Profiling-By-Association: a resilient traffic profiling solution for the internet backbone
Proceedings of the 6th International COnference
HADI: Mining Radii of Large Graphs
ACM Transactions on Knowledge Discovery from Data (TKDD)
Krimp: mining itemsets that compress
Data Mining and Knowledge Discovery
BitShred: feature hashing malware for scalable triage and semantic analysis
Proceedings of the 18th ACM conference on Computer and communications security
A parameter-free method for discovering generalized clusters in a network
DS'11 Proceedings of the 14th international conference on Discovery science
LOCAR: local compression of alternative routes
Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
A compression-boosting transform for two-dimensional data
AAIM'06 Proceedings of the Second international conference on Algorithmic Aspects in Information and Management
Significance and recovery of block structures in binary matrices with noise
COLT'06 Proceedings of the 19th annual conference on Learning Theory
An MDL approach to efficiently discover communities in bipartite network
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
EigenSpokes: surprising patterns and scalable community chipping in large graphs
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Hierarchical clustering and outlier detection for effective image data organization
Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Tripartite community structure in social bookmarking data
The New Review of Hypermedia and Multimedia - Special issue on Social Linking and Hypermedia
Summarization-based mining bipartite graphs
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised sparse matrix co-clustering for marketing and sales intelligence
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
gbase: an efficient analysis platform for large graphs
The VLDB Journal — The International Journal on Very Large Data Bases
Mining coherent anomaly collections on web data
Proceedings of the 21st ACM international conference on Information and knowledge management
Summarizing categorical data by clustering attributes
Data Mining and Knowledge Discovery
Parameter-less co-clustering for star-structured heterogeneous data
Data Mining and Knowledge Discovery
Link Prediction for Bipartite Social Networks: The Role of Structural Holes
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
How to "alternatize" a clustering algorithm
Data Mining and Knowledge Discovery
Hierarchical co-clustering: off-line and incremental approaches
Data Mining and Knowledge Discovery
RoClust: Role discovery for graph clustering
Web Intelligence and Agent Systems
Hi-index | 0.00 |
Large, sparse binary matrices arise in numerous data mining applications, such as the analysis of market baskets, web graphs, social networks, co-citations, as well as information retrieval, collaborative filtering, sparse matrix reordering, etc. Virtually all popular methods for the analysis of such matrices---e.g., k-means clustering, METIS graph partitioning, SVD/PCA and frequent itemset mining---require the user to specify various parameters, such as the number of clusters, number of principal components, number of partitions, and "support." Choosing suitable values for such parameters is a challenging problem.Cross-association is a joint decomposition of a binary matrix into disjoint row and column groups such that the rectangular intersections of groups are homogeneous. Starting from first principles, we furnish a clear, information-theoretic criterion to choose a good cross-association as well as its parameters, namely, the number of row and column groups. We provide scalable algorithms to approach the optimal. Our algorithm is parameter-free, and requires no user intervention. In practice it scales linearly with the problem size, and is thus applicable to very large matrices. Finally, we present experiments on multiple synthetic and real-life datasets, where our method gives high-quality, intuitive results.