An automatic code classification system by using memory-based learning and information retrieval technique

  • Authors:
  • Heui Seok Lim;Won Kyu Hoon Lee;Hyeon Chul Kim;Soon Young Jeong;Heon Chang Yu

  • Affiliations:
  • Dept. of Software, Hanshin University, Korea;Dept. of Computer Science Educatoin, Korea University, Korea;Dept. of Computer Science Educatoin, Korea University, Korea;Dept. of Computer Science Educatoin, Korea University, Korea;Dept. of Computer Science Educatoin, Korea University, Korea

  • Venue:
  • AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes an automatic code classification for Korean census data by using information retrieval technique and memoory-based learning technique. The purpose of the proposed system is to convert natural language responses on survey questionnaires into corresponding numeric codes according to standard code book from the Census Bureau. The system was trained by memory based learning and experimented with 46,762 industry records and occupation 36,286 records. It was evaluated by using 10-fold cross-validation method. As experimental results, the proposed system showed 99.10% and 92.88% production rates for level 2 and level 5 codes respectively.