Reading continuous text from a one-line visual display
International Journal of Man-Machine Studies
Expressive richness: a comparison of speech and text as media for revision
CHI '91 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Tools for building asynchronous servers to support speech and audio applications
UIST '92 Proceedings of the 5th annual ACM symposium on User interface software and technology
Working with audio: integrating personal tape recorders and desktop computers
CHI '92 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Capturing, structuring, and representing ubiquitous audio
ACM Transactions on Information Systems (TOIS)
Informal workplace communication: what is it like and how might we support it?
CHI '94 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
FILOCHAT: handwritten notes provide access to recorded conversations
CHI '94 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Finding and reminding: file organization from the desktop
ACM SIGCHI Bulletin
Retrieving spoken documents by combining multiple index sources
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
SpeechSkimmer: a system for interactively skimming recorded speech
ACM Transactions on Computer-Human Interaction (TOCHI) - Special issue on speech as data
Augmenting real-world objects: a paper-based audio notebook
Conference Companion on Human Factors in Computing Systems
Dynomite: a dynamically organized ink and audio notebook
Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
“I'll get that off the audio”: a case study of salvaging multimedia meeting records
Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Informedia: news-on-demand multimedia information acquisition and retrieval
Intelligent multimedia information retrieval
Play it again: a study of the factors underlying speech browsing behavior
CHI 98 Cconference Summary on Human Factors in Computing Systems
All talk and all action: strategies for managing voicemail messages
CHI 98 Cconference Summary on Human Factors in Computing Systems
SCAN: designing and evaluating user interfaces to support retrieval from speech archives
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Auto-summarization of audio-video presentations
MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Jotmail: a voicemail interface that enables you to see what was said
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
An interactive comic book presentation for exploring video
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
The audio notebook: paper and pen interaction with structured speech
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Usability engineering: scenario-based development of human-computer interaction
Usability engineering: scenario-based development of human-computer interaction
Automated message prioritization: making voicemail retrieval more efficient
CHI '02 Extended Abstracts on Human Factors in Computing Systems
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Four Paradigms for Indexing Video Conferences
IEEE MultiMedia
Improving speech playback using time-compression and speech recognition
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
IBM Systems Journal
A meeting browser evaluation test
CHI '05 Extended Abstracts on Human Factors in Computing Systems
TAP-XL: an automated analyst's assistant
NAACL-Demonstrations '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Demonstrations - Volume 4
Information extraction from voicemail transcripts
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Seeing what your are hearing: coordinating responses to trouble reports in network troubleshooting
ECSCW'03 Proceedings of the eighth conference on European Conference on Computer Supported Cooperative Work
Let's stop pushing the envelope and start addressing it: a reference task agenda for HCI
Human-Computer Interaction
Journal of Artificial Intelligence Research
Browsing recorded meetings with ferret
MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
A new approach to automatic speech summarization
IEEE Transactions on Multimedia
Accessing speech documents on smartphones
Proceedings of the 5th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services
Hi-index | 0.00 |
When users access information from text, they engage in strategic fixation, visually scanning the text to focus on regions of interest. However, because speech is both serial and ephemeral, it does not readily support strategic fixation. This paper describes two design principles, indexing and transcript-centric access that address the problem of speech access by supporting strategic fixation. Indexing involves users constructing external visual indices into speech. Users visually scan these indices to find information-rich regions of speech for more detailed processing and playback. Transcription involves transcribing speech using automatic speech recognition (ASR) and enriching that transcription with visual cues. The resulting enriched transcript is time-aligned to the original speech, allowing users to scan the transcript as a whole or the additional visual cues present in the transcript, to fixate and play regions of interest. We tested the effectiveness of these two approaches on a set of reference tasks derived from observations of current voicemail practice. A field trial evaluation of JotMail, an indexed-based interface similar to commercial unified messaging clients, showed that our approaches were effective in supporting speech scanning, information extraction and status tracking, but not archive management. However, users found it onerous to take manual notes with JotMail to provide effective retrieval indices. We therefore built SCANMail, a transcript-based interface that constructs indices automatically, using ASR to generate a transcript of the speech data. SCANMail also uses information extraction techniques to identify regions of potential interest, e.g. telephone numbers, within the transcript. Laboratory and field trials showed that SCANMail overcame most of the problems users reported with JotMail, supporting scanning, information extraction and archiving. Importantly, our evaluations showed that, despite errors, ASR transcripts provide a highly effective tool for browsing. Users exploited the enriched transcript to determine the gist of the underlying speech, and as a guide to identifying areas of speech that it was critical for them to play. Long-term field trials also showed the utility of transcripts to support notification and mobile access.