A comparative study of voice and graphical user interfaces with respect to literacy levels

  • Authors:
  • Priyanka Chandel; Devanuj;Pankaj Doke

  • Affiliations:
  • Tata Consultancy Services Ltd., Mumbai, India;IIT Bombay, Mumbai, India;Tata Consultancy Services Ltd., Mumbai, India

  • Venue:
  • Proceedings of the 3rd ACM Symposium on Computing for Development
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Visual and aural are two most important channels of information processing. While most of the interaction with computers have been designed around the visual channel, there are circumstances where voice based man-machine interaction becomes preferable, and in some cases, necessary, given that voice based interaction comes naturally to humans and can be used by illiterate people easily. Voice User Interfaces (VUIs), however, are linear and non-persistent, thus have serious implications on the working memory load [10, 11]. Compared to a visual interface, VUIs (considering Interactive Voice Response system) is slow as access is sequential, rather than random. Moreover, however robust a Speech Recognition (SR) platform may be, it can never achieve 100% accuracy. This results in an error prone interaction. In addition, speech interaction may require higher user attention, and take a longer time to complete tasks, as compared to using Graphical User Interface (GUI). The significant work has been done for improvement of IVR systems still the technology has not been exploited to its fullest. Most of the work proposes solution which is common to all [13, 14, 15]. However we know that every individual has a different knowledge, skill and literacy level. A human operator is still good at handling the people with different properties and is therefore usually preferred over IVR [12]. On other hand, GUIs require both mental as well as physical attention. The user would not be able to use the GUI when hands and/or visual channel are busy with other tasks. In this paper, we focus upon the performance of two types of interfaces named VUI (SR based) and GUI, and relate the results with the user's level of literacy. We conducted a 2X2 study, comprising of two goals, of two groups of participants---semi-literate (class 7th to class 11th pass) and literate (undergraduates to PhD students). First goal was to check seats availability in long distance train and second was to book ticket. Each user in both the groups was asked to perform a task that is of reserving a railway ticket using voice as well as visual interfaces. One was a dedicated GUI application on the mobile phone and the other was an IVR application utilizing speech recognition. Every user had to attempt task four times---twice for GUI and twice for VUI. We appreciated that the nature of the two interfaces was different. However, we still wanted to compare the two for overall efficacy. Therefore, the performance was measured twice (we term them the first and second Iteration) for both modes of interaction so that improvement could be ascertained for each interface. We designed and conducted the experiment with 18 people. All were males and from Mumbai. The ages had a mean of 26 years with SD of 5.05 years. The equipment consisted of the following: 1. Mobile phone with ticketing application, for GUI 2. VUI was the combination of speech recognition system along with IVR. The VUI lacked true SR and was based on Wizard of Oz (WoZ) methodology [6]. We decided to use WoZ in order to control for errors due to faulty Speech Recognition which would not have an equivalent in GUI. The user was not aware that the experimenter was controlling the interaction. In our experiment, the VUI was in a question answer based pattern which is easy to understand for a semiliterate person. Also it did not require any retention power. In case of GUI also it did not require person to memorize anything but it demanded for the comprehension capability of the person. We found no significant change in the first and the second iteration of semiliterate people in case of VUI. However, when we segregate the steps into thinking and non-thinking domain then we found that semiliterate people took more time in steps which required thinking. For example, data confirmation, train confirmation, etc. In our setup SR (simulated by Wizard of Oz) did away with complex menu structures, which could help people with low literacy. However, if too much memory load will be used then it would not be very beneficial for semiliterate to use VUI. We saw that in VUI when participant was asked to confirm data, the difference between first and second iteration was considerable. The semiliterate participant faced more difficulty in confirmation in comparison to literate. The parameters that were studied during the experiment were: • Cognitive time taken by the participant • Total time taken by the participant We found that, performance improvement was better in case of GUI (for both types of users). Essentially, in VUI there was no significant improvement in the performance. Given that none of our users had booked tickets using VUI, it could signify the fact that VUI fares well on account of initial exposure and there is much less scope for improvement. Comparing literate and semiliterate users, improvement was better for literate users for both the interfaces, though it was more pronounced for GUIs. The time taken by the participants to complete the task was less in case of VUI. This could be because the application did not provide a sequence of options or hierarchy and this assisted participant to interact easily and in less time. In case of GUI, the participants had to read and interpret the options to give response. The GUI required text comprehension ability of the participant which is seen to be better in literate people in comparison with semiliterate.