Multi-source Data Modelling: Integrating Related Data to Improve Model Performance

  • Authors:
  • Paul R. Trundle;Daniel C. Neagu;Qasim Chaudhry

  • Affiliations:
  • University of Bradford, Richmond Road, Bradford, West Yorkshire, BD7 1DP, UK;University of Bradford, Richmond Road, Bradford, West Yorkshire, BD7 1DP, UK;Central Science Laboratory, Sand Hutton, York, YO41 1LZ, UK

  • Venue:
  • MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional methods in Data Mining cannot be applied to all types of data with equal success. Innovative methods for model creation are needed to address the lack of model performance for data from which it is difficult to extract relationships. This paper proposes a set of algorithms that allow the integration of data from multiple datasets that are related, as well as results from the implementation of these techniques using data from the field of Predictive Toxicology. The results show significant improvements when related data is used to aid in the model creation process, both overall and in specific data ranges. The proposed algorithms have potential for use within any field where multiple datasets exist, particularly in fields combining computing, chemistry and biology.