The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
A technique for computer detection and correction of spelling errors
Communications of the ACM
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
A Comparison of Personal Name Matching: Techniques and Practical Issues
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Phonetic Spelling and Heuristic Search
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Automatic query generation for patent search
Proceedings of the 18th ACM conference on Information and knowledge management
Proceedings of the third symposium on Information interaction in context
An Introduction to Duplicate Detection
An Introduction to Duplicate Detection
Current Challenges in Patent Information Retrieval
Current Challenges in Patent Information Retrieval
Introducing the user-over-ranking hypothesis
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Candidate document retrieval for web-scale text reuse detection
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
A study on query expansion methods for patent retrieval
Proceedings of the 4th workshop on Patent information retrieval
Hi-index | 0.00 |
The search in patent databases is a risky business compared to the search in other domains. A single document that is relevant but overlooked during a patent search can turn into an expensive proposition. While recent research engages in specialized models and algorithms to improve the effectiveness of patent retrieval, we bring another aspect into focus: the detection and exploitation of patent inconsistencies. In particular, we analyze spelling errors in the assignee field of patents granted by the United States Patent & Trademark Office. We introduce technology in order to improve retrieval effectiveness despite the presence of typographical ambiguities. In this regard, we (1) quantify spelling errors in terms of edit distance and phonological dissimilarity and (2) render error detection as a learning problem that combines word dissimilarities with patent meta-features. For the task of finding all patents of a company, our approach improves recall from 96.7% (when using a state-of-the-art patent search engine) to 99.5%, while precision is compromised by only 3.7%.