STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
ACM Computing Surveys (CSUR)
I. Schur, C.E. Shannon and Ramsey Numbers, a short story
Discrete Mathematics
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
Data Mining and Knowledge Discovery
Optimal binary decision trees for diagnostic identification problems
Optimal binary decision trees for diagnostic identification problems
Approximating Min Sum Set Cover
Algorithmica
Improving access to organized information
Improving access to organized information
EvoWorkshops'03 Proceedings of the 2003 international conference on Applications of evolutionary computing
Approximating Optimal Binary Decision Trees
APPROX '08 / RANDOM '08 Proceedings of the 11th international workshop, APPROX 2008, and 12th international workshop, RANDOM 2008 on Approximation, Randomization and Combinatorial Optimization: Algorithms and Techniques
Minimum-effort driven dynamic faceted search in structured databases
Proceedings of the 17th ACM conference on Information and knowledge management
Mapping enterprise entities to text segments
Proceedings of the 2nd PhD workshop on Information and knowledge management
Approximating Decision Trees with Multiway Branches
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Average-case active learning with costs
ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
On the complexity of searching in trees: average-case minimization
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
Approximation algorithms for optimal decision trees and adaptive TSP problems
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
On the Huffman and alphabetic tree problem with general cost functions
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
Decision trees for entity identification: Approximation algorithms and hardness results
ACM Transactions on Algorithms (TALG)
Constructing an optimal decision tree for FAST corner point detection
RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
On the complexity of searching in trees and partially ordered structures
Theoretical Computer Science
Adaptive submodularity: theory and applications in active learning and stochastic optimization
Journal of Artificial Intelligence Research
Generating facets for phone-based navigation of structured data
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
We consider the problem of constructing decision trees for entity identification from a given relational table. The input is a table containing information about a set of entities over a fixed set of attributes and a probability distribution over the set of entities that specifies the likelihood of the occurrence of each entity. The goal is to construct a decision tree that identifies each entity unambiguously by testing the attribute values such that the average number of tests is minimized. This classical problem finds such diverse applications as efficient fault detection, species identification in biology, and efficient diagnosis in the field of medicine. Prior work mainly deals with the special case where the input table is binary and the probability distribution over the set of entities is uniform. We study the general problem involving arbitrary input tables and arbitrary probability distributions over the set of entities. We consider a natural greedy algorithm and prove an approximation guarantee of O(rK • log N), where N is the number of entities and K is the maximum number of distinct values of an attribute. The value rK is a suitably defined Ramsey number, which is at most log K. We show that it is NP-hard to approximate the problem within a factor of Ω(log N), even for binary tables (i.e. K=2). Thus, for the case of binary tables, our approximation algorithm is optimal up to constant factors (since r2=2). In addition, our analysis indicates a possible way of resolving a Ramsey-theoretic conjecture by Erdos.