Country wise classification of human names

  • Authors:
  • Raju Balakrishnan

  • Affiliations:
  • India Software Lab, IBM™, Bangalore, India

  • Venue:
  • AIKED'06 Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Person names in a country follow a particular statistical trend and names of a large set of individuals in a country are derived from a set of names having smaller cardinality. The frequency distribution of person names of different countries varies from each other. The intuitive ability of humans to guess the country of origin of a person from his name is based on these facts. It is possible to design a data mining approach for deciding the country of origin of a person from his name-using the first name and second name as the only independent parameters-and such a tool has wide range of applications. But this is an unexplored problem, complexity and lack of information about human names across different countries may be the reason. In this paper we try to tackle this problem with two data mining algorithms. Firstly, we try a k-nearest neighbor classification for first names and second names, followed by a rule based decision making. The algorithm is trained and tested on person names from nine countries. This method shows accuracy up to 73% for a set of ten countries. Secondly, we try an unsupervised method to improve the knowledge base of the system at runtime. This algorithm can effectively handle the scenarios of 1) a small training set. 2) Apriori probabilities of working set are unknown at training time. The method shows accuracy up to 64% for nine countries.