Genre and domain in patent texts

Authors:
Nelleke Oostdijk;Eva D'hondt;Hans van Halteren;Suzan Verberne
Affiliations:
Radboud University, Nijmegen, Netherlands;Radboud University, Nijmegen, Netherlands;Radboud University, Nijmegen, Netherlands;Radboud University, Nijmegen, Netherlands
Venue:
PaIR '10 Proceedings of the 3rd international workshop on Patent information retrieval
Year:
2010

Citing 7
Cited 0

A study of aboutness in information retrieval

Artificial Intelligence Review
Parsing with Context-Free Grammars and Word Statistics

Parsing with Context-Free Grammars and Word Statistics
The domain dependence of parsing

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Natural language analysis of patent claims

PATENT '03 Proceedings of the ACL-2003 workshop on Patent corpus processing - Volume 20
On the Impact of Lexical and Linguistic Features in Genre- and Domain-Based Categorization

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Phrase-based document categorization revisited

Proceedings of the 2nd international workshop on Patent information retrieval
The PHASAR search engine

NLDB'06 Proceedings of the 11th international conference on Applications of Natural Language to Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we investigate the variation in language use within the very broad patent domain. We find that language use (represented by syntactic phrases) not only differs from one patent class to the next, but is also a characteristic that sets apart the four sections of a patent (viz. Title, Abstract, Description and Claims). This lends support to the claim that these sections can be viewed as different text genres. For the development of a syntactic parser that is trained on patent texts, we quantify the domain and genre differences in terms of the amounts of text needed to train domain-dependent versions of the parser. Our quantified and exemplified findings on the domain variation in patent data are of interest for the patent retrieval and analysis communities.