A study of aboutness in information retrieval
Artificial Intelligence Review
Parsing with Context-Free Grammars and Word Statistics
Parsing with Context-Free Grammars and Word Statistics
The domain dependence of parsing
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Natural language analysis of patent claims
PATENT '03 Proceedings of the ACL-2003 workshop on Patent corpus processing - Volume 20
On the Impact of Lexical and Linguistic Features in Genre- and Domain-Based Categorization
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Phrase-based document categorization revisited
Proceedings of the 2nd international workshop on Patent information retrieval
NLDB'06 Proceedings of the 11th international conference on Applications of Natural Language to Information Systems
Hi-index | 0.00 |
In this paper we investigate the variation in language use within the very broad patent domain. We find that language use (represented by syntactic phrases) not only differs from one patent class to the next, but is also a characteristic that sets apart the four sections of a patent (viz. Title, Abstract, Description and Claims). This lends support to the claim that these sections can be viewed as different text genres. For the development of a syntactic parser that is trained on patent texts, we quantify the domain and genre differences in terms of the amounts of text needed to train domain-dependent versions of the parser. Our quantified and exemplified findings on the domain variation in patent data are of interest for the patent retrieval and analysis communities.