The impact of spelling errors on patent search

Authors:
Benno Stein;Dennis Hoppe;Tim Gollub
Affiliations:
Bauhaus-Universität Weimar, Weimar, Germany;Bauhaus-Universität Weimar, Weimar, Germany;Bauhaus-Universität Weimar, Weimar, Germany
Venue:
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Year:
2012

Citing 12
Cited 0

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
A technique for computer detection and correction of spelling errors

Communications of the ACM
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
A Comparison of Personal Name Matching: Techniques and Practical Issues

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Phonetic Spelling and Heuristic Search

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Automatic query generation for patent search

Proceedings of the 18th ACM conference on Information and knowledge management
A survey of patent users: an analysis of tasks, behavior, search functionality and system requirements

Proceedings of the third symposium on Information interaction in context
An Introduction to Duplicate Detection

An Introduction to Duplicate Detection
Current Challenges in Patent Information Retrieval

Current Challenges in Patent Information Retrieval
Introducing the user-over-ranking hypothesis

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Candidate document retrieval for web-scale text reuse detection

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
A study on query expansion methods for patent retrieval

Proceedings of the 4th workshop on Patent information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The search in patent databases is a risky business compared to the search in other domains. A single document that is relevant but overlooked during a patent search can turn into an expensive proposition. While recent research engages in specialized models and algorithms to improve the effectiveness of patent retrieval, we bring another aspect into focus: the detection and exploitation of patent inconsistencies. In particular, we analyze spelling errors in the assignee field of patents granted by the United States Patent & Trademark Office. We introduce technology in order to improve retrieval effectiveness despite the presence of typographical ambiguities. In this regard, we (1) quantify spelling errors in terms of edit distance and phonological dissimilarity and (2) render error detection as a learning problem that combines word dissimilarities with patent meta-features. For the task of finding all patents of a company, our approach improves recall from 96.7% (when using a state-of-the-art patent search engine) to 99.5%, while precision is compromised by only 3.7%.