A Spectral Algorithm for Seriation and the Consecutive Ones Problem
SIAM Journal on Computing
Spectral partitioning works: planar graphs and finite element meshes
FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
Journal of the ACM (JACM)
Linearized cluster assignment via spectral ordering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Assessing data mining results via swap randomization
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Banded structure in binary matrices
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Approximating the Minimum Chain Completion problem
Information Processing Letters
Environmental Modelling & Software
Using background knowledge to rank itemsets
Data Mining and Knowledge Discovery
Subexponential parameterized algorithm for minimum fill-in
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Hi-index | 0.00 |
Consider each row of a 0-1 dataset as the subset of the columns for which the row has an 1. Then a dataset is nested, if for all pairs of rows one row is either a superset or subset of the other. The concept of nestedness has its origins in ecology, where approximate versions of it has been used to model the species distribution in different locations. We argue that nestedness and its extensions are interesting properties of datasets, and that they can be applied also to domains other than ecology. We first define natural measures of nestedness and study their properties. We then define the concept of k-nestedness: a dataset is (almost) k-nested if the set of columns can be partitioned to k parts so that each part is (almost) nested. We consider the algorithmic problems of computing how far a dataset is from being k-nested, and for finding a good partition of the columns into k parts. The algorithms are based on spectral partitioning, and scale to moderately large datasets. We apply the methods to real data from ecology and from other applications, and demonstrate the usefulness of the concept.