Estimation of quality of service in spelling correction using Kullback-Leibler divergence

  • Authors:
  • Cihan Varol;Coskun Bayrak

  • Affiliations:
  • Computer Science Department, Sam Houston State University, 1903 Ave. I, Huntsville, TX 77341, USA;Computer Science Department, University of Arkansas at Little Rock, 2801 S. University Ave., Little Rock, AR 72212, USA

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.05

Visualization

Abstract

In order to assist the companies dealing with data preparation problems, an approach is developed to handle the dirty data. Cleaning the customer records and producing the desired results require different set of effective tools and sequences such as the near miss strategy and phonetic structure and edit-distance to provide a suggestion table. The selection of the best match is verified and validated by the frequency of presence in the 20th century's Census Bureau statistics. Although, the conducted experiments resulted in better correction rates over the well known ASPELL, JSpell HTML and Ajax Spell Checkers, another remaining challenge is to introduce an estimation of quality factor for our Personal Name Recognizing Strategy Model (PNRS) to distinguish between submitted original names and suggested name estimations from PNRS. Here, we implement a statistical distance metrics for a quality measure by computing the Kullback-Leibler distance (K-L). K-L distance can be used to measure this distance between probability density function of original names and probability density function of suggested names estimated from the PNRS to assess/validate to what degree our edit distance strategy has been successful in correcting names. All submitted names as inputs of the PNRS model were taken in a maximum edit distance of 2 with respect to the original name. Kullback-Leibler distance will be an indicator of name recognizing quality.