Performance comparisons of spatial data processing techniques for a large scale mobile phone dataset

Authors:
Apichon Witayangkurn;Teerayut Horanont;Ryosuke Shibasaki
Affiliations:
The University of Tokyo, Komaba, Tokyo, Japan;The University of Tokyo, Komaba, Tokyo, Japan;The University of Tokyo, Kashiwa-shi, Chiba, Japan
Venue:
Proceedings of the 3rd International Conference on Computing for Geospatial Research and Applications
Year:
2012

Citing 8
Cited 1

The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Using GPS to learn significant locations and predict movement across multiple users

Personal and Ubiquitous Computing
Learning transportation mode from raw gps data for geographic applications on the web

Proceedings of the 17th international conference on World Wide Web
Mining interesting locations and travel sequences from GPS trajectories

Proceedings of the 18th international conference on World wide web
Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility

Future Generation Computer Systems
Spatial Queries Evaluation with MapReduce

GCC '09 Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing
Data warehousing and analytics infrastructure at facebook

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Hadoop in Action

Hadoop in Action

Anomalous event detection on large-scale GPS data from mobile phones using hidden markov model and cloud platform

Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mobile technology, especially mobile phone, is very popular nowadays. Increasing number of mobile users and availability of GPS-embedded mobile phones generate large amount of GPS trajectories that can be used in various research areas such as people mobility and transportation planning. However, how to handle such a large-scale dataset is a significant issue particularly in spatial analysis domain. In this paper, we aimed to explore a suitable way for extracting geo-location of GPS coordinate that achieve large-scale support, fast processing, and easily scalable both in storage and calculation speed. Geo-locations are cities, zones, or any interesting points. Our dataset is GPS trajectories of 1.5 million individual mobile phone users in Japan accumulated for one year. The total number was approximately 9.2 billion records. Therefore, we conducted performance comparisons of various methods for processing spatial data, particularly for a huge dataset. In this work, we first processed data on PostgreSQL with PostGIS that is a traditional way for spatial data processing. Second, we used java application with spatial library called Java Topology suite (JTS). Third, we tried on Hadoop Cloud Computing Platform focusing on using Hive on top of Hadoop to allow SQL-like support. However, Hadoop/Hive did not support spatial query at the moment. Hence, we proposed a solution to enable spatial support on Hive. As the results, Hadoop/hive with spatial support performed best result in large-scale processing among evaluated methods and in addition, we recommended techniques in Hadoop/Hive for processing different types of spatial data.