Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Information Processing and Management: an International Journal
Automatic structuring and retrieval of large text files
Communications of the ACM
An algorithmic framework for performing collaborative filtering
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A patent search and classification system
Proceedings of the fourth ACM conference on Digital libraries
A vector space model for automatic indexing
Communications of the ACM
Modern Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Cluster-based patent retrieval
Information Processing and Management: an International Journal
Patent document categorization based on semantic structural information
Information Processing and Management: an International Journal
Text mining techniques for patent analysis
Information Processing and Management: an International Journal
Introduction to Information Retrieval
Introduction to Information Retrieval
Text Clustering with Feature Selection by Using Statistical Data
IEEE Transactions on Knowledge and Data Engineering
Emerging Technologies of Text Mining: Techniques and Applications
Emerging Technologies of Text Mining: Techniques and Applications
Text classification using graph mining-based feature extraction
Knowledge-Based Systems
A parametric methodology for text classification
Journal of Information Science
An IPC-based vector space model for patent retrieval
Information Processing and Management: an International Journal
Journal of Information Science
Hi-index | 0.00 |
A vector space model (VSM) composed of selected important features is a common way to represent documents, including patent documents. Patent documents have some special characteristics that make it difficult to apply traditional feature selection methods directly: (a) it is difficult to find common terms for patent documents in different categories; and (b) the class label of a patent document is hierarchical rather than flat. Hence, in this article we propose a new approach that includes a hierarchical feature selection (HFS) algorithm which can be used to select more representative features with greater discriminative ability to present a set of patent documents with hierarchical class labels. The performance of the proposed method is evaluated through application to two documents sets with 2400 and 9600 patent documents, where we extract candidate terms from their titles and abstracts. The experimental results reveal that a VSM whose features are selected by a proportional selection process gives better coverage, while a VSM whose features are selected with a weighted-summed selection process gives higher accuracy.