United we fall, divided we stand: a study of query segmentation and prf for patent prior art search

Authors:
Debasis Ganguly;Johannes Leveling;Gareth J.F. Jones
Affiliations:
Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland
Venue:
Proceedings of the 4th workshop on Patent information retrieval
Year:
2011

Citing 12
Cited 1

Improving the effectiveness of information retrieval with local context analysis

ACM Transactions on Information Systems (TOIS)
A language modeling approach to information retrieval

A language modeling approach to information retrieval
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
A review of relevance feedback experiments at the 2003 reliable information access (RIA) workshop.

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Associative document retrieval by query subtopic analysis and its application to invalidity patent search

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Poison pills: harmful relevant documents in feedback

Proceedings of the 14th ACM international conference on Information and knowledge management
Term distillation in patent retrieval

PATENT '03 Proceedings of the ACL-2003 workshop on Patent corpus processing - Volume 20
Enhancing patent retrieval by citation analysis

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Transforming patents into prior-art queries

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
PRES: a score metric for evaluating recall-oriented information retrieval applications

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Exploring structured documents and query formulation techniques for patent retrieval

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Simple vs. sophisticated approaches for patent prior-art search

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval

Utilizing sub-topical structure of documents for information retrieval

Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous research in patent search has shown that reducing queries by extracting a few key terms is ineffective primarily because of the vocabulary mismatch between patent applications used as queries and existing patent documents. This finding has led to the use of full patent applications as queries in patent prior art search. In addition, standard information retrieval (IR) techniques such as query expansion (QE) do not work effectively with patent queries, principally because of the presence of noise terms in the massive queries. In this study, we take a new approach to QE for patent search. Text segmentation is used to decompose a patent query into self coherent sub-topic blocks. Each of these much shorted sub-topic blocks which is representative of a specific aspect or facet of the invention, is then used as a query to retrieve documents. Documents retrieved using the different resulting sub-queries or query streams are interleaved to construct a final ranked list. This technique can exploit the potential benefit of QE since the segmented queries are generally more focused and less ambiguous than the full patent query. Experiments on the CLEF-2010 IP prior-art search task show that the proposed method outperforms the retrieval effectiveness achieved when using a single full patent application text as the query, and also demonstrates the potential benefits of QE to alleviate the vocabulary mismatch problem in patent search.