A patent search and classification system
Proceedings of the fourth ACM conference on Digital libraries
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Information Processing and Management: an International Journal
Multimedia Information Retrieval: Content-Based Information Retrieval from Large Text and Audio Databases
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
An empirical study on retrieval models for different document genres: patents and newspaper articles
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Measuring the accuracy of page-reading systems
Measuring the accuracy of page-reading systems
Natural language analysis of patent claims
PATENT '03 Proceedings of the ACL-2003 workshop on Patent corpus processing - Volume 20
Introduction to the special issue on patent processing
Information Processing and Management: an International Journal
Optical character recognition errors and their effects on natural language processing
Proceedings of the second workshop on Analytics for noisy unstructured text data
Toward a more rational patent search paradigm
Proceedings of the 1st ACM workshop on Patent information retrieval
So many topics, so little time
ACM SIGIR Forum
Hi-index | 0.00 |
The purpose of this study was twofold, first to examine if it is possible to use a general automatic retrieval model, the Vector Space Model (VSM), in order to discover similarities between Swedish patent claims; and second to examine whether an addition morphological decompounding module at the pre-processing level improves the result. In the present study, a comparison between three different topic sets consisting of patent claims was compared against an entire collection of 30,117 claims. The VSM was evaluated with and without additional morphological decompounding modules. The results indicate that decompounding will influence the performance of the retrieval model in a positive way. However, the sublanguage of patent claims and the errors made during the Optical Character Recognition (OCR) process were harmful towards the overall performance of the Natural Language Processing (NLP) applications as well as for the retrieval model.