Patent document categorization based on semantic structural information

Authors:
Jae-Ho Kim;Key-Sun Choi
Affiliations:
Computer Science Department, Korea Advanced Institute of Science and Technology (KAIST), Semantic Web Research Center (SWRC), 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701, Republic of Korea;Computer Science Department, Korea Advanced Institute of Science and Technology (KAIST), Semantic Web Research Center (SWRC), 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701, Republic of Korea
Venue:
Information Processing and Management: an International Journal
Year:
2007

Citing 8
Cited 9

Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Using a generalized instance set for automatic text categorization

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A patent search and classification system

Proceedings of the fourth ACM conference on Digital libraries
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
General Convergence Results for Linear Discriminant Updates

Machine Learning
Automated categorization in the international patent classification

ACM SIGIR Forum
Overview of patent retrieval task at NTCIR-3

PATENT '03 Proceedings of the ACL-2003 workshop on Patent corpus processing - Volume 20

A design rationale representation model using patent documents

Proceedings of the 2nd international workshop on Patent information retrieval
A framework of automatic subject term assignment for text categorization: An indexing conception-based approach

Journal of the American Society for Information Science and Technology
UTA and SICS at CLEF-IP'09

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Hybrid-patent classification based on patent-network analysis

Journal of the American Society for Information Science and Technology
An IPC-based vector space model for patent retrieval

Information Processing and Management: an International Journal
Vector space model for patent documents with hierarchical class labels

Journal of Information Science
Learning the "Whys": Discovering design rationale using text mining - An algorithm perspective

Computer-Aided Design
A three-phase method for patent classification

Information Processing and Management: an International Journal
Query enhancement for patent prior-art-search based on keyterm dependency relations and semantic tags

IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The number of patent documents is currently rising rapidly worldwide, creating the need for an automatic categorization system to replace time-consuming and labor-intensive manual categorization. Because accurate patent classification is crucial to search for relevant existing patents in a certain field, patent categorization is a very important and useful field. As patent documents are structural documents with their own characteristics distinguished from general documents, these unique traits should be considered in the patent categorization process. In this paper, we categorize Japanese patent documents automatically, focusing on their characteristics: patents are structured by claims, purposes, effects, embodiments of the invention, and so on. We propose a patent document categorization method that uses the k-NN (k-Nearest Neighbour) approach. In order to retrieve similar documents from a training document set, some specific components to denote the so-called semantic elements, such as claim, purpose, and application field, are compared instead of the whole texts. Because those specific components are identified by various user-defined tags, first all of the components are clustered into several semantic elements. Such semantically clustered structural components are the basic features of patent categorization. We can achieve a 74% improvement of categorization performance over a baseline system that does not use the structural information of the patent.