Automatic extraction of citation information in Japanese patent applications

  • Authors:
  • Hidetsugu Nanba;Natsumi Anzen;Manabu Okumura

  • Affiliations:
  • Hiroshima City University, Faculty of Information Sciences, 3-4-1 Ozukahigashi, Asaminamiku, 731-3194, Hiroshima, Japan;NEC System Technologies, 1-40-1 Tomo-minami, Asaminamiku, 731-3168, Hiroshima, Japan;Tokyo Institute of Technology, Precision and Intelligence Laboratory, 4259 Nagatsuta, 226-8503, Yokohama, Japan

  • Venue:
  • International Journal on Digital Libraries - Special Issue on Very Large Digital Libraries
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The need for academic researchers to retrieve patents and research papers is increasing, because applying for patents is now considered an important research activity. However, retrieving patents using keywords is a laborious task for researchers, because the terms used in patents for the purpose of enlarging the scope of the claims are generally more abstract than those used in research papers. Therefore, we have constructed a framework that facilitates patent retrieval for researchers, and have integrated research papers and patents by analysing the citation relationships between them. We obtained cited research papers in patents using two steps: (1) detection of sentences containing bibliographic information, and (2) extraction of bibliographic information from those sentences. To investigate the effectiveness of our method, we conducted two experiments. In the experiment involving Step 1, we prepared 42,073 sentences, among which a human subject manually identified 1,476 sentences containing citations of papers. For Step 2, we prepared 3,000 sentences, in which the titles, authors, and other bibliographic information were manually identified. We obtained a precision of 91.6%, and a recall of 86.9% in Step 1, and a precision of 86.2% and a recall of 85.1% in Step 2. Finally, we constructed an information retrieval system that provided two methods of retrieving research papers and patents. One method was retrieval by query, and another was from the citation relationships between research papers and patents.