A clustering study of a 7000 EU document inventory using MDS and SOM

  • Authors:
  • Patrick A. De Mazière;Marc M. Van Hulle

  • Affiliations:
  • Laboratorium voor Neuro-en Psychofysiologie, K.U.Leuven, Leuven, Belgium;Laboratorium voor Neuro-en Psychofysiologie, K.U.Leuven, Leuven, Belgium

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.05

Visualization

Abstract

In this article, we discuss a number of methods and tools to cluster a 7000 document inventory in order to evaluate the impact of EU funded research in social sciences and humanities on EU policies. The inventory, which is not publicly available, but provided to us by the European Union (EU) in the framework of an EU project, could be divided into three main categories: research documents, influential policy documents, and policy documents. To represent the results in a way that non-experts could make use of it, we explored and compared two visualisation techniques, multi-dimensional scaling (MDS) and the self-organising map (SOM), and one of the latter's derivatives, the U-matrix. Contrary to most other approaches, which perform text analyses only on document titles and abstracts, we performed a full text analysis on more than 300,000 pages in total. Due to the inability of many software suites to handle text mining problems of this size, we developed our own analysis platform. We show that the combination of a U-matrix and an MDS map, which is rarely performed in the domain of text mining, reveals information that would go unnoticed otherwise. Furthermore, we show that the combination of a database, to store the data and the (intermediate) results, and a webserver, to visualise the results, offers a powerful platform to analyse the data and share the results with all participants/collaborators involved in a data- and computation intensive EU-project, thereby guaranteeing both data- and result consistency.