Classification of research papers into a patent classification system using two translation models

Authors:
Hidetsugu Nanba;Toshiyuki Takezawa
Affiliations:
Hiroshima City University, Hiroshima, Japan;Hiroshima City University, Hiroshima, Japan
Venue:
NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
Year:
2009

Citing 3
Cited 1

Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Automatic extraction of citation information in Japanese patent applications

International Journal on Digital Libraries - Special Issue on Very Large Digital Libraries

Whetting the appetite of scientists: producing summaries tailored to the citation context

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classifying research papers into patent classification systems enables an exhaustive and effective invalidity search, prior art search, and technical trend analysis. However, it is very costly to classify research papers manually. Therefore, we have studied automatic classification of research papers into a patent classification system. To classify research papers into patent classification systems, the differences in terms used in research papers and patents should be taken into account. This is because the terms used in patents are often more abstract or creative than those used in research papers in order to widen the scope of the claims. It is also necessary to do exhaustive searches and analyses that focus on classification of research papers written in various languages. To solve these problems, we propose some classification methods using two machine translation models. When translating English research papers into Japanese, the performance of a translation model for patents is inferior to that for research papers due to the differences in terms used in research papers and patents. However, the model for patents is thought to be useful for our task because translation results by patent translation models tend to contain more patent terms than those for research papers. To confirm the effectiveness of our methods, we conducted some experiments using the data of the Patent Mining Task in the NTCIR-7 Workshop. From the experimental results, we found that our method using translation models for both research papers and patents was more effective than using a single translation model.