Semantic Based Text Classification of Patent Documents to a User-Defined Taxonomy

  • Authors:
  • Ashish Sureka;Pranav Prabhakar Mirajkar;Prasanna Nagesh Teli;Girish Agarwal;Sumit Kumar Bose

  • Affiliations:
  • Software Engineering and Technology Labs (SETLabs), Infosys Technologies Ltd., India;Software Engineering and Technology Labs (SETLabs), Infosys Technologies Ltd., India;Software Engineering and Technology Labs (SETLabs), Infosys Technologies Ltd., India;Software Engineering and Technology Labs (SETLabs), Infosys Technologies Ltd., India;Software Engineering and Technology Labs (SETLabs), Infosys Technologies Ltd., India

  • Venue:
  • ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a generic approach for semantic based classification of text documents to pre-defined categories. The proposed technique is applied to the domain of patent analytics for the purpose of classifying a collection of patent documents to one or many nodes in a user-defined taxonomy. The proposed approach is a multi-step process consisting of noun extraction, word sense disambiguation, semantic relatedness computation between pair of words using WordNet and confidence score computation. The proposed algorithm resulted in good accuracy on experimental dataset and can be easily adapted and customized to other domains other the patent landscape analysis domain discussed in this paper.