Vector space model for patent documents with hierarchical class labels

Authors:
Yen-Liang Chen;Yu-Ting Chiu
Affiliations:
;
Venue:
Journal of Information Science
Year:
2012

Citing 20
Cited 1

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval

Information Processing and Management: an International Journal
Automatic structuring and retrieval of large text files

Communications of the ACM
An algorithmic framework for performing collaborative filtering

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A patent search and classification system

Proceedings of the fourth ACM conference on Digital libraries
A vector space model for automatic indexing

Communications of the ACM
Modern Information Retrieval

Modern Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Automated categorization in the international patent classification

ACM SIGIR Forum
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Cluster-based patent retrieval

Information Processing and Management: an International Journal
Patent document categorization based on semantic structural information

Information Processing and Management: an International Journal
Text mining techniques for patent analysis

Information Processing and Management: an International Journal
Introduction to Information Retrieval

Introduction to Information Retrieval
Text Clustering with Feature Selection by Using Statistical Data

IEEE Transactions on Knowledge and Data Engineering
Emerging Technologies of Text Mining: Techniques and Applications

Emerging Technologies of Text Mining: Techniques and Applications
Text classification using graph mining-based feature extraction

Knowledge-Based Systems
A parametric methodology for text classification

Journal of Information Science
An IPC-based vector space model for patent retrieval

Information Processing and Management: an International Journal

A proposed IPC-based clustering method for exploiting expert knowledge and its application to strategic planning

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

A vector space model (VSM) composed of selected important features is a common way to represent documents, including patent documents. Patent documents have some special characteristics that make it difficult to apply traditional feature selection methods directly: (a) it is difficult to find common terms for patent documents in different categories; and (b) the class label of a patent document is hierarchical rather than flat. Hence, in this article we propose a new approach that includes a hierarchical feature selection (HFS) algorithm which can be used to select more representative features with greater discriminative ability to present a set of patent documents with hierarchical class labels. The performance of the proposed method is evaluated through application to two documents sets with 2400 and 9600 patent documents, where we extract candidate terms from their titles and abstracts. The experimental results reveal that a VSM whose features are selected by a proportional selection process gives better coverage, while a VSM whose features are selected with a weighted-summed selection process gives higher accuracy.