Hybrid clustering for validation and improvement of subject-classification schemes

  • Authors:
  • Frizo Janssens;Lin Zhang;Bart De Moor;Wolfgang Glänzel

  • Affiliations:
  • K.U. Leuven, Centre for R&D Monitoring (ECOOM), Dept. MSI, Leuven, Belgium and Attentio SA/NV, StudioTROPE Building, Bloemenstraat 32, B-1000 Brussels, Belgium and K.U. Leuven, ESAT-SCD, Leuven, B ...;K.U. Leuven, Centre for R&D Monitoring (ECOOM), Dept. MSI, Leuven, Belgium and WISE Lab, Dalian University of Technology, Dalian, China;K.U. Leuven, ESAT-SCD, Leuven, Belgium;K.U. Leuven, Centre for R&D Monitoring (ECOOM), Dept. MSI, Leuven, Belgium and Hungarian Academy of Sciences, IRPS, Budapest, Hungary

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A hybrid text/citation-based method is used to cluster journals covered by the Web of Science database in the period 2002-2006. The objective is to use this clustering to validate and, if possible, to improve existing journal-based subject-classification schemes. Cross-citation links are determined on an item-by-paper procedure for individual papers assigned to the corresponding journal. Text mining for the textual component is based on the same principle; textual characteristics of individual papers are attributed to the journals in which they have been published. In a first step, the 22-field subject-classification scheme of the Essential Science Indicators (ESI) is evaluated and visualised. In a second step, the hybrid clustering method is applied to classify the about 8300 journals meeting the selection criteria concerning continuity, size and impact. The hybrid method proves superior to its two components when applied separately. The choice of 22 clusters also allows a direct field-to-cluster comparison, and we substantiate that the science areas resulting from cluster analysis form a more coherent structure than the ''intellectual'' reference scheme, the ESI subject scheme. Moreover, the textual component of the hybrid method allows labelling the clusters using cognitive characteristics, while the citation component allows visualising the cross-citation graph and determining representative journals suggested by the PageRank algorithm. Finally, the analysis of journal 'migration' allows the improvement of existing classification schemes on the basis of the concordance between fields and clusters.