An application of information retrieval technique to automated code classification

  • Authors:
  • Heui Seok Lim;Seong Hoon Lee

  • Affiliations:
  • Dept. of Software, Hanshin University, Korea;Dept. of Information and Communications, Cheonan University, Korea

  • Venue:
  • KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes an application of information retrieval techniques to automated industry and occupation code classification for Korean Census records. The purpose of the proposed system is to convert natural language responses on survey questionnaires into corresponding numeric codes according to standard code book from the Census Bureau. The system was experimented with 46,762 industry records and occupation 36,286 records using 10-fold cross-validation evaluation method. As experimental results, the system showed 87.08% and 66.08% production rates when classifying industry records into level 2 and level 5 codes respectively. In semi-automated mode, it showed 99.10% and 92.88% production rates for level 2 and level 5 codes respectively.