Enhancing the stability and efficiency of spectral ordering with partial supervision and feature selection

Authors:
Dimitrios Mavroeidis;Ella Bingham
Affiliations:
Department of Informatics, Athens University of Economics and Business, Athens, Greece;Helsinki Institute for Information Technology, University of Helsinki, Helsinki, Finland
Venue:
Knowledge and Information Systems
Year:
2010

Citing 15
Cited 4

The algebraic eigenvalue problem

The algebraic eigenvalue problem
An Analysis of Spectral Envelope Reduction via Quadratic Assignment Problems

SIAM Journal on Matrix Analysis and Applications
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
A Spectral Algorithm for Seriation and the Consecutive Ones Problem

SIAM Journal on Computing
A spectral method to separate disconnected and nearly-disconnected web graph components

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Extrapolation methods for accelerating PageRank computations

WWW '03 Proceedings of the 12th international conference on World Wide Web
Linearized cluster assignment via spectral ordering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Random matrices in data analysis

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Learning Spectral Clustering, With Application To Speech Separation

The Journal of Machine Learning Research
Stability of feature selection algorithms: a study on high-dimensional spaces

Knowledge and Information Systems
A tutorial on spectral clustering

Statistics and Computing
Top 10 algorithms in data mining

Knowledge and Information Systems
Stability Based Sparse LSI/PCA: Incorporating Feature Selection in LSI and PCA

ECML '07 Proceedings of the 18th European conference on Machine Learning
Clustering based on matrix approximation: a unifying view

Knowledge and Information Systems
Non-negative matrix factorization for semi-supervised data clustering

Knowledge and Information Systems

Accelerating spectral clustering with partial supervision

Data Mining and Knowledge Discovery
Mind the eigen-gap, or how to accelerate semi-supervised spectral learning algorithms

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
A Sequential Sampling Framework for Spectral k-Means Based on Efficient Bootstrap Accuracy Estimations: Application to Distributed Clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Feature selection for k-means clustering stability: theoretical analysis and an algorithm

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several studies have demonstrated the prospects of spectral ordering for data mining. One successful application is seriation of paleontological findings, i.e. ordering the sites of excavation, using data on mammal co-occurrences only. However, spectral ordering ignores the background knowledge that is naturally present in the domain: paleontologists can derive the ages of the sites within some accuracy. On the other hand, the age information is uncertain, so the best approach would be to combine the background knowledge with the information on mammal co-occurrences. Motivated by this kind of partial supervision we propose a novel semi-supervised spectral ordering algorithm that modifies the Laplacian matrix such that domain knowledge is taken into account. Also, it performs feature selection by discarding features that contribute most to the unwanted variability of the data in bootstrap sampling. Moreover, we demonstrate the effectiveness of the proposed framework on the seriation of Usenet newsgroup messages, where the task is to find out the underlying flow of discussion. The theoretical properties of our algorithm are thoroughly analyzed and it is demonstrated that the proposed framework enhances the stability of the spectral ordering output and induces computational gains.