Estimating entity importance via counting set covers

Authors:
Aristides Gionis;Theodoros Lappas;Evimaria Terzi
Affiliations:
Yahoo! Research, Barcelona, Spain;Boston University, Boston, MA, USA;Boston University, Boston, MA, USA
Venue:
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2012

Citing 19
Cited 1

Randomized algorithms

Randomized algorithms
Identifying the Minimal Transversals of a Hypergraph and Related Problems

SIAM Journal on Computing
Approximation algorithms

Approximation algorithms
A Fast Algorithm for Computing Hypergraph Transversals and its Application in Mining Emerging Patterns

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Maximizing the spread of influence through a social network

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
The Parameterized Complexity of Counting Problems

SIAM Journal on Computing
Approximating a collection of frequent sets

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Parameterized enumeration, transversals, and imperfect phylogeny reconstruction

Theoretical Computer Science - Parameterized and exact computation
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Counting models for 2SAT and 3SAT formulae

Theoretical Computer Science
A Data Mining Formalization to Improve Hypergraph Minimal Transversal Computation

Fundamenta Informaticae
A team formation model based on knowledge and collaboration

Expert Systems with Applications: An International Journal
Finding a team of experts in social networks

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining opinion features in customer reviews

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
The union of minimal hitting sets: Parameterized combinatorial bounds and counting

Journal of Discrete Algorithms
Power in unity: forming teams in large-scale community systems

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Efficient confident search in large review corpora

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Selecting a comprehensive set of reviews

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Toward a fair review-management system

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II

Active learning from relative queries

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The data-mining literature is rich in problems asking to assess the importance of entities in a given dataset. At a high level, existing work identifies important entities either by ranking or by selection. Ranking methods assign a score to every entity in the population, and then use the assigned scores to create a ranked list. The major shortcoming of such approaches is that they ignore the redundancy between high-ranked entities, which may in fact be very similar or even identical. Therefore, in scenarios where diversity is desirable, such methods perform poorly. Selection methods overcome this drawback by evaluating the importance of a group of entities collectively. To achieve this, they typically adopt a set-cover formulation, which identifies the entities in the minimum set cover as the important ones. However, this dichotomy of entities conceals the fact that, even though an entity may not be in the reported cover, it may still participate in many other optimal or near-optimal solutions. In this paper, we propose a framework that overcomes the above drawbacks by integrating the ranking and selection paradigms. Our approach assigns importance scores to entities based on both the number and the quality of set-cover solutions that they participate. Our algorithmic contribution lies with the design of an efficient algorithm for approximating the number of high-quality set covers that each entity participates. Our methodology applies to a wide range of applications. In a user study and an experimental evaluation on real data, we demonstrate that our framework is efficient and provides useful and intuitive results.