Crowd++: unsupervised speaker count with smartphones

Authors:
Chenren Xu;Sugang Li;Gang Liu;Yanyong Zhang;Emiliano Miluzzo;Yih-Farn Chen;Jun Li;Bernhard Firner
Affiliations:
Rutgers University, North Brunswick , NJ, USA;Rutgers University, North Brunswick , NJ, USA;UT Dallas, Richardson, NJ, USA;Rutgers University, North Brunswick, NJ, USA;AT&T Labs Research, Florham Park, NJ, USA;AT&T Labs Research, Florham Park, NJ, USA;Rutgers University, North Brunswick, NJ, USA;Rutgers University, North Brunswick, NJ, USA
Venue:
Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing
Year:
2013

Citing 17
Cited 1

Fundamentals of speech recognition

Fundamentals of speech recognition
Linear Prediction of Speech

Linear Prediction of Speech
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
HTIMIT and LLHDB: Speech Corpora for the Study of Handset Transducer Effects

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Sensing and Modeling Human Networks using the Sociometer

ISWC '03 Proceedings of the 7th IEEE International Symposium on Wearable Computers
SurroundSense: mobile phone localization via ambience fingerprinting

Proceedings of the 15th annual international conference on Mobile computing and networking
Modeling dominance in group conversations using nonverbal activity cues

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Darwin phones: the evolution of sensing and inference on mobile phones

Proceedings of the 8th international conference on Mobile systems, applications, and services
Speaker count application for smartphone platforms

ISWPC'10 Proceedings of the 5th IEEE international conference on Wireless pervasive computing
EmotionSense: a mobile phones based adaptive platform for experimental social psychology research

Proceedings of the 12th ACM international conference on Ubiquitous computing
Indoor localization without infrastructure using the acoustic background spectrum

MobiSys '11 Proceedings of the 9th international conference on Mobile systems, applications, and services
SpeakerSense: energy efficient unobtrusive speaker identification on mobile phones

Pervasive'11 Proceedings of the 9th international conference on Pervasive computing
Passive and In-Situ assessment of mental and physical well-being using mobile sensors

Proceedings of the 13th international conference on Ubiquitous computing
Speaker Diarization: A Review of Recent Research

IEEE Transactions on Audio, Speech, and Language Processing
Cloud-enabled privacy-preserving collaborative learning for mobile sensing

Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems
Low cost crowd counting using audio tones

Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems
SCPL: indoor device-free multi-subject counting and localization using radio signal strength

Proceedings of the 12th international conference on Information processing in sensor networks

Device-free people counting and localization

Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Smartphones are excellent mobile sensing platforms, with the microphone in particular being exercised in several audio inference applications. We take smartphone audio inference a step further and demonstrate for the first time that it's possible to accurately estimate the number of people talking in a certain place -- with an average error distance of 1.5 speakers -- through unsupervised machine learning analysis on audio segments captured by the smartphones. Inference occurs transparently to the user and no human intervention is needed to derive the classification model. Our results are based on the design, implementation, and evaluation of a system called Crowd++, involving 120 participants in 10 very different environments. We show that no dedicated external hardware or cumbersome supervised learning approaches are needed but only off-the-shelf smartphones used in a transparent manner. We believe our findings have profound implications in many research fields, including social sensing and personal wellbeing assessment.