A framework for evaluating the smoothness of data-mining results

Authors:
Gaurav Misra;Behzad Golshan;Evimaria Terzi
Affiliations:
Computer Science Department, Boston University;Computer Science Department, Boston University;Computer Science Department, Boston University
Venue:
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Year:
2012

Citing 14
Cited 0

Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Accuracy and Stability of Numerical Algorithms

Accuracy and Stability of Numerical Algorithms
Maximizing the spread of influence through a social network

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs

Bioinformatics
Assessing data mining results via swap randomization

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph evolution: Densification and shrinking diameters

ACM Transactions on Knowledge Discovery from Data (TKDD)
Smooth sensitivity and sampling in private data analysis

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Approximate clustering without the approximation

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Smoothed analysis: an attempt to explain the behavior of algorithms in practice

Communications of the ACM - A View of Parallel Computing
Randomization methods for assessing data analysis results on real-valued matrices

Statistical Analysis and Data Mining
k-Means Has Polynomial Smoothed Complexity

FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Efficient confident search in large review corpora

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Testing the Significance of Patterns in Data with Cluster Structure

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Selecting a comprehensive set of reviews

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The data-mining literature is rich in problems that are formalized as combinatorial-optimization problems. An indicative example is the entity-selection formulation that has been used to model the problem of selecting a subset of representative reviews from a review corpus [11,22]or important nodes in a social network [10]. Existing combinatorial algorithms for solving such entity-selection problems identify a set of entities (e.g., reviews or nodes) as important. Here, we consider the following question: how do small or large changes in the input dataset change the value or the structure of the such reported solutions? We answer this question by developing a general framework for evaluating the smoothness (i.e, consistency) of the data-mining results obtained for the input dataset X. We do so by comparing these results with the results obtained for datasets that are within a small or a large distance from X. The algorithms we design allow us to perform such comparisons effectively and thus, approximate the results' smoothness efficiently. Our experimental evaluation on real datasets demonstrates the efficacy and the practical utility of our framework in a wide range of applications.