Patent classification of the new invention using PLSA

  • Authors:
  • Ranjeet Kumar;Shrishail Math;R. C. Tripathi;M. D. Tiwari

  • Affiliations:
  • IIIT-ALLAHABAD, Deoghat Jhalwa;IIIT-ALLAHABAD, Deoghat Jhalwa;IIIT-ALLAHABAD, Deoghat Jhalwa;IIIT-ALLAHABAD, Deoghat Jhalwa

  • Venue:
  • Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the current scenario of the world for Research and Development leading to patenting, content classification in accordance with the subject areas to which it belongs to is a challenging task. This is because today's R&D draws its novelty/newness not in one technical area but a unique combination of different technical areas. For example, a Typical ICT patent may be a composite effect for advancing the knowledge in some combination of Control Engg, Electronic Components, Databases Technology, Information retrieval methodology, Internet and Wireless technology, Speech, Signal, and Image Processing etc. In this paper, the work has been reported for the content classification for a newly drafted patent document using Probabilistic Latent Semantic Analysis technique. The probabilistic latent semantic analysis (PLSA) is used for automated indexing of the document by creating an indexer which tokenizes the documents and creates a proper generative model. Herein a singular value decomposition model is used for compacting the size of term document matrix and their co-occurrences in the matrix. The objective is to take up the large document corpora generated from the past patent document to categorize documents based on the concept generated model. The approach is illustrated and has been tested for by an example classification of the content for two typical US Patent Classes, and has been found to work well for them.