Evaluating computer-generated domain-oriented vocabularies
Information Processing and Management: an International Journal
Program: Automated Library and Information Systems
SIGCOMM '92 Conference proceedings on Communications architectures & protocols
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Tolerating spelling errors during patient validation
Computers and Biomedical Research
Efficient network QoS provisioning based on per node traffic shaping
IEEE/ACM Transactions on Networking (TON)
Phonetic string matching: lessons from information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of approximate string matching algorithms
Software—Practice & Experience
Retrieval effectiveness of proper name search methods
Information Processing and Management: an International Journal
Speech recognition: theory and C++ implementation
Speech recognition: theory and C++ implementation
ACM Computing Surveys (CSUR)
Spelling correction in user interfaces
Communications of the ACM
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Deriving Service Models in Cross-Organizational Workflows
RIDE '99 Proceedings of the Ninth International Workshop on Research Issues on Data Engineering: Information Technology for Virtual Enterprises
Measures of distributional similarity
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Syllable Alignment: A Novel Model for Phonetic String Search
IEICE - Transactions on Information and Systems
A Comparison of Personal Name Matching: Techniques and Practical Issues
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Using Kullback-Leibler distance for text categorization
ECIR'03 Proceedings of the 25th European conference on IR research
Quality of service guarantees in virtual circuit switched networks
IEEE Journal on Selected Areas in Communications
Hi-index | 12.05 |
In order to assist the companies dealing with data preparation problems, an approach is developed to handle the dirty data. Cleaning the customer records and producing the desired results require different set of effective tools and sequences such as the near miss strategy and phonetic structure and edit-distance to provide a suggestion table. The selection of the best match is verified and validated by the frequency of presence in the 20th century's Census Bureau statistics. Although, the conducted experiments resulted in better correction rates over the well known ASPELL, JSpell HTML and Ajax Spell Checkers, another remaining challenge is to introduce an estimation of quality factor for our Personal Name Recognizing Strategy Model (PNRS) to distinguish between submitted original names and suggested name estimations from PNRS. Here, we implement a statistical distance metrics for a quality measure by computing the Kullback-Leibler distance (K-L). K-L distance can be used to measure this distance between probability density function of original names and probability density function of suggested names estimated from the PNRS to assess/validate to what degree our edit distance strategy has been successful in correcting names. All submitted names as inputs of the PNRS model were taken in a maximum edit distance of 2 with respect to the original name. Kullback-Leibler distance will be an indicator of name recognizing quality.