Constructing Bayesian Networks to Predict Uncollectible Telecommunications Accounts

  • Authors:
  • Kazuo J. Ezawa;Steven W. Norton

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Expert: Intelligent Systems and Their Applications
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

Every year the telecommunications industry incurs several billion dollars in uncollectible debt. Even though this is a small percentage of the more than 100 billion dollars in annual collectible revenues, it still represents a significant problem. Controlling uncollectibles falls into the larger risk management process, which ensures customer satisfaction, reduces operating expenses, and maximizes profitability. The ideal solution to controlling uncollectibles, of course, is to have an oracle unerringly identify customers who will not pay their bills or pinpoint phone calls that cannot be collected. It is impossible, however, to be absolutely certain about a customer or a call because there is no way to know a person's intent. Given that reality, the best you can do is to use a probability model to estimate the conditional probability of uncollectible debt using factors indicative of past uncollectibles. You can then apply the model to current data and input the results to a normative decision-support system. That way, the system can evaluate a variety of actions, from inaction to call disconnect.A key element of modern risk management is the ability to use large quantities of historical data to build models that assess the risk per customer or per transaction. In telecommunications, databases containing hundreds of millions or even billions of records are not uncommon. Hence, efficiency becomes a primary concern. Independent probability models can be constructed efficiently but do not capture the dependencies between variables that are so important in this field. On the other hand, highly search-intensive learning algorithms that do address dependencies between variables, such as K2, are impractical in the telecommunications arena because of the vast amount of input data and the number of variables used to describe it.To address this need, we developed the Advanced Pattern Recognition and Identification system. APRI lets us automatically construct Bayesian network models for classification problems using extremely large databases. APRI's key strength is its ability to efficiently select relevant variables and dependencies to build conditionally dependent models. In fact, APRI reads the data from secondary storage at most five times during the entire model-building process. This is in sharp contrast to other Bayesian network learning systems, whose complexity grows linearly or quadratically with the number of input variables.To evaluate APRI, we built four probability models: a fully independent model, a limited independent model, and two dependent models. We used data sets with four to six million records, representing 600 to 800 million bytes of data. One of the dependent models classified 37% of the uncollectible calls correctly, versus the 10% classified correctly by the fully independent model. These results confirm the necessity of learning dependent models even though the amount of data required seems prohibitive.