A vector space analysis of swedish patent claims with different linguistic indices

Authors:
Linda Andersson
Affiliations:
Information Retrieval Facility, Vienna, Austria
Venue:
PaIR '10 Proceedings of the 3rd international workshop on Patent information retrieval
Year:
2010

Citing 13
Cited 0

A patent search and classification system

Proceedings of the fourth ACM conference on Digital libraries
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Aspects of Swedish morphology and semantics from the perspective of mono- and cross-language information retrieval

Information Processing and Management: an International Journal
Multimedia Information Retrieval: Content-Based Information Retrieval from Large Text and Audio Databases

Multimedia Information Retrieval: Content-Based Information Retrieval from Large Text and Audio Databases
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
An empirical study on retrieval models for different document genres: patents and newspaper articles

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Measuring the accuracy of page-reading systems

Measuring the accuracy of page-reading systems
Automated categorization in the international patent classification

ACM SIGIR Forum
Natural language analysis of patent claims

PATENT '03 Proceedings of the ACL-2003 workshop on Patent corpus processing - Volume 20
Introduction to the special issue on patent processing

Information Processing and Management: an International Journal
Optical character recognition errors and their effects on natural language processing

Proceedings of the second workshop on Analytics for noisy unstructured text data
Toward a more rational patent search paradigm

Proceedings of the 1st ACM workshop on Patent information retrieval
So many topics, so little time

ACM SIGIR Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

The purpose of this study was twofold, first to examine if it is possible to use a general automatic retrieval model, the Vector Space Model (VSM), in order to discover similarities between Swedish patent claims; and second to examine whether an addition morphological decompounding module at the pre-processing level improves the result. In the present study, a comparison between three different topic sets consisting of patent claims was compared against an entire collection of 30,117 claims. The VSM was evaluated with and without additional morphological decompounding modules. The results indicate that decompounding will influence the performance of the retrieval model in a positive way. However, the sublanguage of patent claims and the errors made during the Optical Character Recognition (OCR) process were harmful towards the overall performance of the Natural Language Processing (NLP) applications as well as for the retrieval model.