Performance comparisons of spatial data processing techniques for a large scale mobile phone dataset

  • Authors:
  • Apichon Witayangkurn;Teerayut Horanont;Ryosuke Shibasaki

  • Affiliations:
  • The University of Tokyo, Komaba, Tokyo, Japan;The University of Tokyo, Komaba, Tokyo, Japan;The University of Tokyo, Kashiwa-shi, Chiba, Japan

  • Venue:
  • Proceedings of the 3rd International Conference on Computing for Geospatial Research and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mobile technology, especially mobile phone, is very popular nowadays. Increasing number of mobile users and availability of GPS-embedded mobile phones generate large amount of GPS trajectories that can be used in various research areas such as people mobility and transportation planning. However, how to handle such a large-scale dataset is a significant issue particularly in spatial analysis domain. In this paper, we aimed to explore a suitable way for extracting geo-location of GPS coordinate that achieve large-scale support, fast processing, and easily scalable both in storage and calculation speed. Geo-locations are cities, zones, or any interesting points. Our dataset is GPS trajectories of 1.5 million individual mobile phone users in Japan accumulated for one year. The total number was approximately 9.2 billion records. Therefore, we conducted performance comparisons of various methods for processing spatial data, particularly for a huge dataset. In this work, we first processed data on PostgreSQL with PostGIS that is a traditional way for spatial data processing. Second, we used java application with spatial library called Java Topology suite (JTS). Third, we tried on Hadoop Cloud Computing Platform focusing on using Hive on top of Hadoop to allow SQL-like support. However, Hadoop/Hive did not support spatial query at the moment. Hence, we proposed a solution to enable spatial support on Hive. As the results, Hadoop/hive with spatial support performed best result in large-scale processing among evaluated methods and in addition, we recommended techniques in Hadoop/Hive for processing different types of spatial data.