On the structural properties of massive telecom call graphs: findings and implications
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Software Architecture Challenges for Data Intensive Computing
WICSA '08 Proceedings of the Seventh Working IEEE/IFIP Conference on Software Architecture (WICSA 2008)
Social ties and their relevance to churn in mobile telecom networks
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Scientific Cloud Computing: Early Definition and Experience
HPCC '08 Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications
A break in the clouds: towards a cloud definition
ACM SIGCOMM Computer Communication Review
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Proceedings of the third international workshop on Cloud data management
Hi-index | 0.01 |
The continued exponential growth in both the volume and the complexity of information, compared with the computing capacity of the silicon-based devices restricted by Moore's Law, is giving birth to a new challenge to the specific requirements of analysts, researchers and intelligence providers. With respect to this challenge, a new class of techniques and computing platforms, such as Map-Reduce model, which mainly focus on scalability and parallelism, has been emerging. In this paper, to move the scientific prototype forward to practice, we elaborate a prototype of our applied distributed system, DisTec , for knowledge discovery from social network perspective in the field of telecommunications. The major infrastructure is constructed on Hadoop, an open-source counterpart of Google's Map-Reduce. We carefully devised our system to undertake the mining tasks in terabytes call records. To illustrate its functionality, DisTec is applied to real-world large-scale telecom dataset. The experiments range from initial raw data preprocessing to final knowledge extraction. We demonstrate that our system has a good performance in such cloud-scale data computing.