Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Accuracy and Stability of Numerical Algorithms
Accuracy and Stability of Numerical Algorithms
Maximizing the spread of influence through a social network
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Assessing data mining results via swap randomization
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph evolution: Densification and shrinking diameters
ACM Transactions on Knowledge Discovery from Data (TKDD)
Smooth sensitivity and sampling in private data analysis
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Approximate clustering without the approximation
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Smoothed analysis: an attempt to explain the behavior of algorithms in practice
Communications of the ACM - A View of Parallel Computing
Randomization methods for assessing data analysis results on real-valued matrices
Statistical Analysis and Data Mining
k-Means Has Polynomial Smoothed Complexity
FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Efficient confident search in large review corpora
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Testing the Significance of Patterns in Data with Cluster Structure
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Selecting a comprehensive set of reviews
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
The data-mining literature is rich in problems that are formalized as combinatorial-optimization problems. An indicative example is the entity-selection formulation that has been used to model the problem of selecting a subset of representative reviews from a review corpus [11,22]or important nodes in a social network [10]. Existing combinatorial algorithms for solving such entity-selection problems identify a set of entities (e.g., reviews or nodes) as important. Here, we consider the following question: how do small or large changes in the input dataset change the value or the structure of the such reported solutions? We answer this question by developing a general framework for evaluating the smoothness (i.e, consistency) of the data-mining results obtained for the input dataset X. We do so by comparing these results with the results obtained for datasets that are within a small or a large distance from X. The algorithms we design allow us to perform such comparisons effectively and thus, approximate the results' smoothness efficiently. Our experimental evaluation on real datasets demonstrates the efficacy and the practical utility of our framework in a wide range of applications.