CosDic: Towards a Comprehensive System for Knowledge Discovery in Large-Scale Data: Architecture, Implementation and Case Studies

  • Authors:
  • Bin Wu;Shengqi Yang;Haizhou Zhao;Yuan Gao;Lijun Suo

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The continued exponential growth in both the volume and the complexity of information is giving birth to a new challenge to the specific requirements of analysts, researchers and intelligence providers. In this paper, to move the scientific activity forward to practice, we elaborate a prototype of our on-going constructed system, CosDic, for knowledge discovery from extremely large-scale datasets. The major infrastructure of CosDic is deployed on a distributed cluster environment using MapReduce platform. To undertake the mining tasks from gigabytes to petabytes, we carefully devised our system, from architecture to particular algorithms, from under layer construction to upper layer public service interface, from effectiveness to efficiency. Moreover, to illustrate its functionality, we employ CosDic to a real-world huge dataset and demonstrate an integrated analysis procedure from initial raw data preprocessing to finally knowledge discovering. We show that CosDic has a good performance in such cloud-scale data computing.