Machine Learning
Neural networks for pattern recognition
Neural networks for pattern recognition
Extraction of Visual Features for Lipreading
IEEE Transactions on Pattern Analysis and Machine Intelligence
Active Contours: The Application of Techniques from Graphics,Vision,Control Theory and Statistics to Visual Tracking of Shapes in Motion
Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
A fast learning algorithm for deep belief nets
Neural Computation
Springer Handbook of Speech Processing
Springer Handbook of Speech Processing
Speeded-Up Robust Features (SURF)
Computer Vision and Image Understanding
Pattern Recognition, Fourth Edition
Pattern Recognition, Fourth Edition
HCII'11 Proceedings of the 14th international conference on Human-computer interaction: interaction techniques and environments - Volume Part II
Dialog model development of a mobile information and reference robot
Pattern Recognition and Image Analysis
Directed enumeration method in image recognition
Pattern Recognition
Audio-visual speech modeling for continuous speech recognition
IEEE Transactions on Multimedia
Adaptive video image recognition system using a committee machine
Optical Memory and Neural Networks
Phonetic words decoding software in the problem of Russian speech recognition
Automation and Remote Control
Hi-index | 0.00 |
The paper considers the phoneme recognition by facial expressions of a speaker in voice-activated control systems. We have developed a neural network recognition algorithm by using the phonetic words decoding method and the requirement for isolated syllable pronunciation of voice commands. The paper presents the experimental results of viseme (facial and lip position corresponding to a particular phoneme) classification of Russian vowels. We show the dependence of the classification accuracy on the used classifier (multilayer feed-forward network, support vector machine, k-nearest neighbor method), image features (histogram of oriented gradients, eigenvectors, SURF local descriptors) and the type of camera (built-in or Kinect one). The best accuracy of speaker-dependent recognition is shown to be 85% for a built-in camera and 96% for Kinect depth maps when the classification is performed with the histogram of oriented gradients and the support vector machine.