Hybrid clustering for validation and improvement of subject-classification schemes

Authors:
Frizo Janssens;Lin Zhang;Bart De Moor;Wolfgang Glänzel
Affiliations:
K.U. Leuven, Centre for R&D Monitoring (ECOOM), Dept. MSI, Leuven, Belgium and Attentio SA/NV, StudioTROPE Building, Bloemenstraat 32, B-1000 Brussels, Belgium and K.U. Leuven, ESAT-SCD, Leuven, B ...;K.U. Leuven, Centre for R&D Monitoring (ECOOM), Dept. MSI, Leuven, Belgium and WISE Lab, Dalian University of Technology, Dalian, China;K.U. Leuven, ESAT-SCD, Leuven, Belgium;K.U. Leuven, Centre for R&D Monitoring (ECOOM), Dept. MSI, Leuven, Belgium and Hungarian Academy of Sciences, IRPS, Budapest, Hungary
Venue:
Information Processing and Management: an International Journal
Year:
2009

Citing 21
Cited 4

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Algorithms for clustering data

Algorithms for clustering data
Using linear algebra for intelligent information retrieval

SIAM Review
HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering

Proceedings of the the seventh ACM conference on Hypertext
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Clustering hypertext with applications to web searching

HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
Modern Information Retrieval

Modern Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Evaluating contents-link coupled web page clustering for web search results

Proceedings of the eleventh international conference on Information and knowledge management
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Automatic Topic Identification Using Webpage Clustering

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Intelligent fusion of structural and citation-based evidence for text classification

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Combining full text and bibliometric information in mapping scientific disciplines

Information Processing and Management: an International Journal - Special issue: Infometrics
Link-based similarity measures for the classification of Web documents

Journal of the American Society for Information Science and Technology
Can scientific journals be classified in terms of aggregated journal-journal citation relations using the Journal Citation Reports?

Journal of the American Society for Information Science and Technology
Automatically labeling hierarchical clusters

dg.o '06 Proceedings of the 2006 international conference on Digital government research
Towards mapping library and information science

Information Processing and Management: an International Journal - Special issue: Informetrics
Algorithm 862: MATLAB tensor classes for fast algorithm prototyping

ACM Transactions on Mathematical Software (TOMS)
Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Visualizing the marrow of science

Journal of the American Society for Information Science and Technology
A global map of science based on the ISI subject categories

Journal of the American Society for Information Science and Technology

Using `core documents' for the representation of clusters and topics

Scientometrics
Hybrid clustering of multi-view data via Tucker-2 model and its application

Scientometrics
Optimal and hierarchical clustering of large-scale hybrid networks for scientific mapping

Scientometrics
A new methodology for constructing a publication-level classification system of science

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

A hybrid text/citation-based method is used to cluster journals covered by the Web of Science database in the period 2002-2006. The objective is to use this clustering to validate and, if possible, to improve existing journal-based subject-classification schemes. Cross-citation links are determined on an item-by-paper procedure for individual papers assigned to the corresponding journal. Text mining for the textual component is based on the same principle; textual characteristics of individual papers are attributed to the journals in which they have been published. In a first step, the 22-field subject-classification scheme of the Essential Science Indicators (ESI) is evaluated and visualised. In a second step, the hybrid clustering method is applied to classify the about 8300 journals meeting the selection criteria concerning continuity, size and impact. The hybrid method proves superior to its two components when applied separately. The choice of 22 clusters also allows a direct field-to-cluster comparison, and we substantiate that the science areas resulting from cluster analysis form a more coherent structure than the ''intellectual'' reference scheme, the ESI subject scheme. Moreover, the textual component of the hybrid method allows labelling the clusters using cognitive characteristics, while the citation component allows visualising the cross-citation graph and determining representative journals suggested by the PageRank algorithm. Finally, the analysis of journal 'migration' allows the improvement of existing classification schemes on the basis of the concordance between fields and clusters.