Estimation of quality of service in spelling correction using Kullback-Leibler divergence

Authors:
Cihan Varol;Coskun Bayrak
Affiliations:
Computer Science Department, Sam Houston State University, 1903 Ave. I, Huntsville, TX 77341, USA;Computer Science Department, University of Arkansas at Little Rock, 2801 S. University Ave., Little Rock, AR 72212, USA
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 20
Cited 0

Evaluating computer-generated domain-oriented vocabularies

Information Processing and Management: an International Journal
PHOENIX: the algorithm

Program: Automated Library and Information Systems
Supporting real-time applications in an Integrated Services Packet Network: architecture and mechanism

SIGCOMM '92 Conference proceedings on Communications architectures & protocols
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
Tolerating spelling errors during patient validation

Computers and Biomedical Research
Efficient network QoS provisioning based on per node traffic shaping

IEEE/ACM Transactions on Networking (TON)
Phonetic string matching: lessons from information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of approximate string matching algorithms

Software—Practice & Experience
Retrieval effectiveness of proper name search methods

Information Processing and Management: an International Journal
Speech recognition: theory and C++ implementation

Speech recognition: theory and C++ implementation
Approximate String Matching

ACM Computing Surveys (CSUR)
Spelling correction in user interfaces

Communications of the ACM
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Deriving Service Models in Cross-Organizational Workflows

RIDE '99 Proceedings of the Ninth International Workshop on Research Issues on Data Engineering: Information Technology for Virtual Enterprises
Measures of distributional similarity

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An improved error model for noisy channel spelling correction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Syllable Alignment: A Novel Model for Phonetic String Search

IEICE - Transactions on Information and Systems
A Comparison of Personal Name Matching: Techniques and Practical Issues

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Using Kullback-Leibler distance for text categorization

ECIR'03 Proceedings of the 25th European conference on IR research
Quality of service guarantees in virtual circuit switched networks

IEEE Journal on Selected Areas in Communications

Quantified Score

Hi-index	12.05

Visualization

Abstract

In order to assist the companies dealing with data preparation problems, an approach is developed to handle the dirty data. Cleaning the customer records and producing the desired results require different set of effective tools and sequences such as the near miss strategy and phonetic structure and edit-distance to provide a suggestion table. The selection of the best match is verified and validated by the frequency of presence in the 20th century's Census Bureau statistics. Although, the conducted experiments resulted in better correction rates over the well known ASPELL, JSpell HTML and Ajax Spell Checkers, another remaining challenge is to introduce an estimation of quality factor for our Personal Name Recognizing Strategy Model (PNRS) to distinguish between submitted original names and suggested name estimations from PNRS. Here, we implement a statistical distance metrics for a quality measure by computing the Kullback-Leibler distance (K-L). K-L distance can be used to measure this distance between probability density function of original names and probability density function of suggested names estimated from the PNRS to assess/validate to what degree our edit distance strategy has been successful in correcting names. All submitted names as inputs of the PNRS model were taken in a maximum edit distance of 2 with respect to the original name. Kullback-Leibler distance will be an indicator of name recognizing quality.