SpeechPlay: composing and sharing expressive speech through visually augmented text

Authors:
Kian Peen Yeo;Suranga Nanayakkara
Affiliations:
Singapore University of Technology and Design, Singapore;Singapore University of Technology and Design, Singapore
Venue:
Proceedings of the 25th Australian Computer-Human Interaction Conference: Augmentation, Application, Innovation, Collaboration
Year:
2013

Citing 10
Cited 0

Tools for expressive text-to-speech markup

Proceedings of the 14th annual ACM symposium on User interface software and technology
Speech Synthesis: Technology for Disabled People

Speech Synthesis: Technology for Disabled People
Sweep and point and shoot: phonecam-based interactions for large public displays

CHI '05 Extended Abstracts on Human Factors in Computing Systems
MobiLenin combining a multi-track music video, personal mobile phones and a public display into multi-user interactive entertainment

Proceedings of the 13th annual ACM international conference on Multimedia
Engaging with a situated display via picture messaging

CHI '06 Extended Abstracts on Human Factors in Computing Systems
Comedia: mobile group media for active spectatorship

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Extending large-scale event participation with user-created mobile media on a public display

Proceedings of the 6th international conference on Mobile and ubiquitous multimedia
BlogWall: a new paradigm of artistic public mobile communication

Proceedings of the 9th international conference on Human computer interaction with mobile devices and services
Ubiquitous voice synthesis: interactive manipulation of speech and singing on mobile distributed platforms

CHI '11 Extended Abstracts on Human Factors in Computing Systems
Social and privacy aspects of a system for collaborative public expression

Proceedings of the 8th International Conference on Advances in Computer Entertainment Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

SpeechPlay allows users to create and share expressive synthetic voices in a fun and interactive manner. It promotes a new level of self-expression and public communication by adding expressiveness to a plain text. Control of prosody information in synthesized speech output is based on the visual appearance of the text, which can be manipulated with touch gestures. Users could create/modify contents using their mobile phone (SpeechPlay Mobile application) and publish/share their work on a large screen (SpeechPlay Surface). Initial user reactions suggest that the correlation between the visual appearance of a text phrase and the resulting audio was intuitive. While it is possible to make the speech output more expressive, users could easily distort the naturalness of the voice in a fun manner. This could also be a useful tool for music composers and for training new musicians.