Document clustering method using dimension reduction and support vector clustering to overcome sparseness

Authors:
Sunghae Jun;Sang-Sung Park;Dong-Sik Jang
Affiliations:
Department of Statistics, Cheongju University, 298, Daeseong-ro Sangdang-gu, Cheongju, Chungbuk 360-764, Republic of Korea;Graduate School of Management of Technology, Korea University, 1, 5-Ka, Anam-dong Sungbuk-ku, Seoul 136-701, Republic of Korea;Division of Industrial Management Engineering, Korea University, 1, 5-Ka, Anam-dong Sungbuk-ku, Seoul 136-701, Republic of Korea
Venue:
Expert Systems with Applications: An International Journal
Year:
2014

Citing 20
Cited 0

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Learning from Data: Concepts, Theory, and Methods

Learning from Data: Concepts, Theory, and Methods
K-means clustering via principal component analysis

ICML '04 Proceedings of the twenty-first international conference on Machine learning
An Improved Cluster Labeling Method for Support Vector Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Novel Hybrid Hierarchical-K-means Clustering Method (H-K-means) for Microarray Analysis

CSBW '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference - Workshops
A fuzzy clustering approach for finding similar documents using a novel similarity measure

Expert Systems with Applications: An International Journal
Hierarchically SVM classification based on support vector clustering method and its application to document categorization

Expert Systems with Applications: An International Journal
Text mining techniques for patent analysis

Information Processing and Management: an International Journal
An efficient document classification model using an improved back propagation neural network and singular value decomposition

Expert Systems with Applications: An International Journal
Clustering of document collection - A weighting approach

Expert Systems with Applications: An International Journal
Automatically Determining the Number of Clusters in Unlabeled Data Sets

IEEE Transactions on Knowledge and Data Engineering
Using the self organizing map for clustering of text documents

Expert Systems with Applications: An International Journal
Generic title labeling for clustered documents

Expert Systems with Applications: An International Journal
Technology management simply defined: A tweet plus two characters

Journal of Engineering and Technology Management
An empirical examination of the science-technology relationship in the biotechnology industry

Journal of Engineering and Technology Management
A clustering study of a 7000 EU document inventory using MDS and SOM

Expert Systems with Applications: An International Journal
Case studies of technology roadmapping in mining

Journal of Engineering and Technology Management
Technology roadmapping for technology-based product-service integration: A case study

Journal of Engineering and Technology Management
An SAO-based text mining approach to building a technology tree for technology planning

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

Many studies on developing technologies have been published as articles, papers, or patents. We use and analyze these documents to find scientific and technological trends. In this paper, we consider document clustering as a method of document data analysis. In general, we have trouble analyzing documents directly because document data are not suitable for statistical and machine learning methods of analysis. Therefore, we have to transform document data into structured data for analytical purposes. For this process, we use text mining techniques. The structured data are very sparse, and hence, it is difficult to analyze them. This study proposes a new method to overcome the sparsity problem of document clustering. We build a combined clustering method using dimension reduction and K-means clustering based on support vector clustering and Silhouette measure. In particular, we attempt to overcome the sparseness in patent document clustering. To verify the efficacy of our work, we first conduct an experiment using news data from the machine learning repository of the University of California at Irvine. Second, using patent documents retrieved from the United States Patent and Trademark Office, we carry out patent clustering for technology forecasting.