Approximating Decision Trees with Multiway Branches

Authors:
Venkatesan T. Chakaravarthy;Vinayaka Pandit;Sambuddha Roy;Yogish Sabharwal
Affiliations:
IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India
Venue:
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Year:
2009

Citing 7
Cited 1

Decision Trees and Diagrams

ACM Computing Surveys (CSUR)
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey

Data Mining and Knowledge Discovery
On an Optimal Split Tree Problem

WADS '99 Proceedings of the 6th International Workshop on Algorithms and Data Structures
Approximating Min Sum Set Cover

Algorithmica
Decision trees for entity identification: approximation algorithms and hardness results

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Improving access to organized information

Improving access to organized information
Approximating Optimal Binary Decision Trees

APPROX '08 / RANDOM '08 Proceedings of the 11th international workshop, APPROX 2008, and 12th international workshop, RANDOM 2008 on Approximation, Randomization and Combinatorial Optimization: Algorithms and Techniques

Decision trees for entity identification: Approximation algorithms and hardness results

ACM Transactions on Algorithms (TALG)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of constructing decision trees for entity identification from a given table. The input is a table containing information about a set of entities over a fixed set of attributes. The goal is to construct a decision tree that identifies each entity unambiguously by testing the attribute values such that the average number of tests is minimized. The previously best known approximation ratio for this problem was O (log2 N ). In this paper, we present a new greedy heuristic that yields an improved approximation ratio of O (logN ).