Associative document retrieval by query subtopic analysis and its application to invalidity patent search

  • Authors:
  • Toru Takaki;Atsushi Fujii;Tetsuya Ishikawa

  • Affiliations:
  • NTT DATA Corporation, Tokyo, Japan;University of Tsukuba, Tsukuba, Japan;University of Tsukuba, Tsukuba, Japan

  • Venue:
  • Proceedings of the thirteenth ACM international conference on Information and knowledge management
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose an associative document retrieval method, in which a document is used as a query to search for other similar documents. Because a long document usually includes more than one topic, we first analyze a query document to extract multiple subtopics. For each subtopic element, a sub-query is produced and similar documents are retrieved with a relevance score. The relevance scores are weighted by the importance of each subtopic element and are integrated to determine the final relevant documents. In the calculation of the subtopic importance, the specificity of a query term is evaluated using entropy, which is the deviation degree of the appearances of the term in each subtopic element. We apply this method to an invalidity patent search. By exploiting certain unique features of Japanese patent claims, we use features distinguishing the preamble and the essential portion in a query patent claim. To demonstrate the effectiveness of our method, we experimentally evaluated our associative document retrieval method on five years of patent documents.