Subset-conjunctive rules for breast cancer diagnosis

Authors:
Rajeev Kohli;Ramesh Krishnamurti;Kamel Jedidi
Affiliations:
Graduate School of Business, Columbia University;School of Computing Science, Faculty of Applied Sciences, Simon Fraser University, Burnaby, B.C., Canada;Graduate School of Business, Columbia University
Venue:
Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
Year:
2006

Citing 9
Cited 1

Cause-effect relationships and partially defined Boolean functions

Annals of Operations Research
The Minimum Satisfiability Problem

SIAM Journal on Discrete Mathematics
Predicting Cause-Effect Relationships from Incomplete Discrete Observations

SIAM Journal on Discrete Mathematics
Randomized algorithms

Randomized algorithms
Logical analysis of numerical data

Mathematical Programming: Series A and B - Special issue: papers from ismp97, the 16th international symposium on mathematical programming, Lausanne EPFL
Convexity and logical analysis of data

Theoretical Computer Science
Approximation algorithms

Approximation algorithms
An Implementation of Logical Analysis of Data

IEEE Transactions on Knowledge and Data Engineering
Pseudo-boolean optimization

Discrete Applied Mathematics

Research on Innovation: A Review and Agenda for Marketing Science

Marketing Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

The objective of this study was to distinguish within a population of patients with and without breast cancer. The study was based on the University of Wisconsin's dataset of 569 patients, of whom 212 were subsequently found to have breast cancer. A subset-conjunctive model, which is related to Logical Analysis of Data, is described to distinguish between the two groups of patients based on the results of a non-invasive procedure called Fine Needle Aspiration, which is often used by physicians before deciding on the need for a biopsy. We formulate the problem of inferring subset-conjunctive rules as a 0-1 integer program, show that it is NP-Hard, and prove that it admits no polynomial-time constant-ratio approximation algorithm. We examine the performance of a randomized algorithm, and of randomization using LP rounding. In both cases, the expected performance ratio is arbitrarily bad. We use a deterministic greedy algorithm to identify a Pareto-efficient set of subset-conjunctive rules; describe how the rules change with a re-weighting of the type-I and type-II errors; how the best rule changes with the subset size; and how much of a tradeoff is required between the two types of error as one selects a more stringent or more lax classification rule. An important aspect of the analysis is that we find a sequence of closely related efficient rules, which can be readily used in a clinical setting because they are simple and have the same structure as the rules currently used in clinical diagnosis.