Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Software Architecture Challenges for Data Intensive Computing
WICSA '08 Proceedings of the Seventh Working IEEE/IFIP Conference on Software Architecture (WICSA 2008)
Social ties and their relevance to churn in mobile telecom networks
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Proceedings of the third international workshop on Cloud data management
Hi-index | 0.00 |
The continued exponential growth in both the volume and the complexity of information is giving birth to a new challenge to the specific requirements of analysts, researchers and intelligence providers. In this paper, to move the scientific activity forward to practice, we elaborate a prototype of our on-going constructed system, CosDic, for knowledge discovery from extremely large-scale datasets. The major infrastructure of CosDic is deployed on a distributed cluster environment using MapReduce platform. To undertake the mining tasks from gigabytes to petabytes, we carefully devised our system, from architecture to particular algorithms, from under layer construction to upper layer public service interface, from effectiveness to efficiency. Moreover, to illustrate its functionality, we employ CosDic to a real-world huge dataset and demonstrate an integrated analysis procedure from initial raw data preprocessing to finally knowledge discovering. We show that CosDic has a good performance in such cloud-scale data computing.