A UML profile for the conceptual modelling of data-mining with time-series in data warehouses

  • Authors:
  • Jose Zubcoff;Jesús Pardillo;Juan Trujillo

  • Affiliations:
  • Lucentia Research Group, Department of Sea Sciences and Applied Biology, University of Alicante, 03690 Alicante, Spain;Lucentia Research Group, Department of Software and Computing Systems, University of Alicante, 03690 Alicante, Spain;Lucentia Research Group, Department of Software and Computing Systems, University of Alicante, 03690 Alicante, Spain

  • Venue:
  • Information and Software Technology
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Time-series analysis is a powerful technique to discover patterns and trends in temporal data. However, the lack of a conceptual model for this data-mining technique forces analysts to deal with unstructured data. These data are represented at a low-level of abstraction and their management is expensive. Most analysts face up to two main problems: (i) the cleansing of the huge amount of potentially-analysable data and (ii) the correct definition of the data-mining algorithms to be employed. Owing to the fact that analysts' interests are also hidden in this scenario, it is not only difficult to prepare data, but also to discover which data is the most promising. Since their appearance, data warehouses have, therefore, proved to be a powerful repository of historical data for data-mining purposes. Moreover, their foundational modelling paradigm, such as, multidimensional modelling, is very similar to the problem domain. In this article, we propose a unified modelling language (UML) extension through UML profiles for data-mining. Specifically, the UML profile presented allows us to specify time-series analysis on top of the multidimensional models of data warehouses. Our extension provides analysts with an intuitive notation for time-series analysis which is independent of any specific data-mining tool or algorithm. In order to show its feasibility and ease of use, we apply it to the analysis of fish-captures in Alicante. We believe that a coherent conceptual modelling framework for data-mining assures a better and easier knowledge-discovery process on top of data warehouses.