Identity matching and information acquisition: Estimation of optimal threshold parameters

  • Authors:
  • Pantea Alirezazadeh;Fidan Boylu;Robert Garfinkel;Ram Gopal;Paulo Goes

  • Affiliations:
  • Department of Operations and Information Management, University of Connecticut, United States;Department of Operations and Information Management, University of Connecticut, United States;Department of Operations and Information Management, University of Connecticut, United States;Department of Operations and Information Management, University of Connecticut, United States;Department of Management Information Systems, University of Arizona, United States

  • Venue:
  • Decision Support Systems
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the growing volume of collected and stored data from customer interactions that have recently shifted towards online channels, an important challenge faced by today's businesses is appropriately dealing with data quality problems. A key step in the data cleaning process is the matching and merging of customer records to assess the identity of individuals. The practical importance of this research is exemplified by a large client firm that deals with private label credit cards. They needed to know whether there existed histories of new customers within the company, in order to decide on the appropriate parameters of possible card offerings. The company incurs substantial costs if they incorrectly ''match'' an incoming application with an existing customer (Type I error), and also if they falsely assume that there is no match (Type II error). While there is a good deal of generic identity matching software available, that will provide a ''strength'' score for each potential match, the question of how to use the scores for new applications is of great interest and is addressed in this work. The academic significance lies in the analysis of the score thresholds that are typically used in decision making. That is, upper and lower thresholds are set, where matches are accepted above the former, rejected below the latter, and more information is gathered between the two. We show, for the first time, that the optimal thresholds can be considered to be parameters of a matching distribution, and a number of estimators of these parameters are developed and analyzed. Then extensive computations show the effects of various factors on the convergence rates of the estimates.