A divisive ordering algorithm for mapping categorical data to numeric data

  • Authors:
  • Huang-Cheng Kuo

  • Affiliations:
  • Department of Computer Science and Information Engineering, National Chiayi University, Chiayi City, Taiwan

  • Venue:
  • KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The amount of computing time for K Nearest Neighbor Search is linear to the size of the dataset if the dataset is not indexed. This is not endurable for on-line applications with time constraints when the dataset is large. However, if there are categorical attributes in the dataset, an index cannot be built on the dataset. One possible solution to index such datasets is to convert categorical attributes into numeric attributes. Categories are ordered and then are mapped to numeric values. In this paper, we propose a new heuristic ordering algorithm to compare with two previously proposed algorithms that borrow the idea from minimal spanning trees. The new algorithm divisively builds a binary tree by recursively partitioning the categories. Then, we in-order traverse the tree and get an ordering of the categories. After mapping and indexing, we can efficiently retrieve a small portion of the dataset and perform K nearest neighbor search on the portion at the cost of a little bit of accuracy. Experiments show the divisive ordering algorithm performs better than the other two algorithms.