Prior art retrieval using the claims section as a bag of words

Authors:
Suzan Verberne;Eva D'hondt
Affiliations:
Information Foraging Lab, Radboud University Nijmegen;Information Foraging Lab, Radboud University Nijmegen
Venue:
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Year:
2009

Citing 5
Cited 1

Patent claim processing for readability: structure analysis and term explanation

PATENT '03 Proceedings of the ACL-2003 workshop on Patent corpus processing - Volume 20
A study of search tactics for patentability search: a case study on patent engineers

Proceedings of the 1st ACM workshop on Patent information retrieval
Evaluating patent retrieval in the third NTCIR workshop

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
CLEF-IP 2009: retrieval experiments in the intellectual property domain

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Automatically generating queries for prior art search

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments

UAIC: participation in CLEF-IP track

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe our participation in the 2009 CLEFIP task, which was targeted at prior-art search for topic patent documents. We opted for a baseline approach to get a feeling for the specifics of the task and the documents used. Our system retrieved patent documents based on a standard bag-of-words approach for both the Main Task and the English Task. In both runs, we extracted the claim sections from all English patents in the corpus and saved them in the Lemur index format with the patent IDs as DOCIDs. These claims were then indexed using Lemur's BuildIndex function. In the topic documents we also focused exclusively on the claims sections. These were extracted and converted to queries by removing stopwords and punctuation.We did not perform any term selection or query expansion. We retrieved 100 patents per topic using Lemur's RetEval function, retrieval model TF-IDF. Compared to the other runs submitted to the track, we obtained good results in terms of nDCG (0.46) and moderate results in terms of MAP (0.054).