Feature engineering for semantic place prediction

Authors:
Yin Zhu;Erheng Zhong;Zhongqi Lu;Qiang Yang
Affiliations:
-;-;-;-
Venue:
Pervasive and Mobile Computing
Year:
2013

Citing 23
Cited 0

A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
Bagging predictors

Machine Learning
Extracting Semantic Location from Outdoor Positioning Systems

MDM '06 Proceedings of the 7th International Conference on Mobile Data Management
An empirical comparison of supervised learning algorithms

ICML '06 Proceedings of the 23rd international conference on Machine learning
Query enrichment for web-query classification

ACM Transactions on Information Systems (TOIS)
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
High-level goal recognition in a wireless LAN

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Feature selection for ranking using boosted trees

Proceedings of the 18th ACM conference on Information and knowledge management
Stochastic gradient boosted distributed decision trees

Proceedings of the 18th ACM conference on Information and knowledge management
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Using mobile phones to determine transportation modes

ACM Transactions on Sensor Networks (TOSN)
l1 regularization in infinite dimensional feature spaces

COLT'07 Proceedings of the 20th annual conference on Learning theory
Mining significant semantic locations from GPS data

Proceedings of the VLDB Endowment
The F# asynchronous programming model

PADL'11 Proceedings of the 13th international conference on Practical aspects of declarative languages
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
On the semantic annotation of places in location-based social networks

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Bayesian nonparametric modeling of user activities

Proceedings of the 2011 international workshop on Trajectory data mining and analysis
When recommendation meets mobile: contextual and personalized recommendation on the go

Proceedings of the 13th international conference on Ubiquitous computing
Learning location naming from user check-in histories

Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Large-scale machine learning at twitter

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Parallel machine learning on big data

XRDS: Crossroads, The ACM Magazine for Students - Big Data
A few useful things to know about machine learning

Communications of the ACM
Automatically characterizing places with opportunistic crowdsensing using smartphones

Proceedings of the 2012 ACM Conference on Ubiquitous Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present in this paper our winning solution to Dedicated Task 1 in Nokia Mobile Data Challenge (MDC). MDC Task 1 is to infer the semantic category of a place based on the smartphone sensing data obtained at that place. We approach this task in a standard supervised learning setting: we extract discriminative features from the sensor data and use state-of-the-art classifiers (SVM, Logistic Regression and Decision Tree Family) to build classification models. We have found that feature engineering, or in other words, constructing features using human heuristics, is very effective for this task. In particular, we have proposed a novel feature engineering technique, Conditional Feature (CF), a general framework for domain-specific feature construction. In total, we have generated 2,796,200 features and in our final five submissions we use feature selection to select 100 to 2000 features. One of our key findings is that features conditioned on fine-granularity time intervals, e.g. every 30 min, are most effective. Our best 10-fold CV accuracy on training set is 75.1% by Gradient Boosted Trees, and the second best accuracy is 74.6% by L1-regularized Logistic Regression. Besides the good performance, we also report briefly our experience of using F# language for large-scale (~70 GB raw text data) conditional feature construction.