Automatic gazette creation for named entity recognition and application to resume processing

  • Authors:
  • Sachin Pawar;Rajiv Srivastava;Girish Keshav Palshikar

  • Affiliations:
  • Tata Research Development and Design Centre, Hadapsar Industrial Estate, Pune, India;Tata Research Development and Design Centre, Hadapsar Industrial Estate, Pune, India;Tata Research Development and Design Centre, Hadapsar Industrial Estate, Pune, India

  • Venue:
  • Proceedings of the 5th ACM COMPUTE Conference: Intelligent & scalable system technologies
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Named entities are important content-carrying units within documents. Consequently named entity recognition (NER) is an important part of information extraction. One fast and accurate approach to NER uses a list or gazette consisting of known instances. Gazette creation problem considers how to automatically create a comprehensive gazette from given unlabeled document repository. We describe an unsupervised algorithm for automatic gazette creation, which is modified from [5]. We propose a fast NER algorithm using large gazette and show that it significantly outperforms a naïve approach based on regular expressions. We describe experimental results obtained by using the system for gazette creation for various resume related named entities (e.g., ORG, DEGREE, EDUCATIONAL_INSTITUTE, DESIGNATION) and the associated NER on a large set of real-life resumes.