sMFCC: exploiting sparseness in speech for fast acoustic feature extraction on mobile devices -- a feasibility study

Authors:
Shahriar Nirjon;Robert Dickerson;John Stankovic;Guobin Shen;Xiaofan Jiang
Affiliations:
University of Virginia;University of Virginia;University of Virginia;Microsoft Research Asia, Beijing, China;Intel Labs China, Beijing, China
Venue:
Proceedings of the 14th Workshop on Mobile Computing Systems and Applications
Year:
2013

Citing 11
Cited 1

Comparison of different implementations of MFCC

Journal of Computer Science and Technology
Some topics in analysis of boolean functions

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Sensing meets mobile social networks: the design, implementation and evaluation of the CenceMe application

Proceedings of the 6th ACM conference on Embedded network sensor systems
SoundSense: scalable sound sensing for people-centric applications on mobile phones

Proceedings of the 7th international conference on Mobile systems, applications, and services
Fast approximate correlation for massive time-series data

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
2nd international workshop on intelligent user interfaces for developing regions: IUI4DR

Proceedings of the 16th international conference on Intelligent user interfaces
SpeakerSense: energy efficient unobtrusive speaker identification on mobile phones

Pervasive'11 Proceedings of the 9th international conference on Pervasive computing
Simple and practical algorithm for sparse Fourier transform

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Nearly optimal sparse fourier transform

STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
Compressed sensing

IEEE Transactions on Information Theory
MusicalHeart: a hearty way of listening to music

Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems

The 14th international workshop on mobile computing systems and applications (ACM HotMobile 2013)

ACM SIGMOBILE Mobile Computing and Communications Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to limited processing capability, contemporary smartphones cannot extract frequency domain acoustic features in real-time on the device when the sampling rate is high. We propose a solution to this problem which exploits the sparseness in speech to extract frequency domain acoustic features inside a smartphone in real-time, without requiring any support from a remote server even when the sampling rate is as high as 44.1 KHz. We perform an empirical study to quantify the sparseness in speech recorded on a smartphone and use it to obtain a highly accurate and sparse approximation of a widely used feature of speech called the Mel-Frequency Cepstral Coefficients (MFCC) efficiently. We name the new feature the sparse MFCC or sMFCC, in short. We experimentally determine the trade-offs between the approximation error and the expected speedup of sMFCC. We implement a simple spoken word recognition application using both MFCC and sMFCC features, show that sMFCC is expected to be up to 5.84 times faster and its accuracy is within 1.1% -- 3.9% of that of MFCC, and determine the conditions under which sMFCC runs in real-time.