Eye typing with common cameras

  • Authors:
  • Dan Witzner Hansen;John Paulin Hansen

  • Affiliations:
  • IT University, Copenhangen;IT University, Copenhangen

  • Venue:
  • Proceedings of the 2006 symposium on Eye tracking research & applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

Low cost eye tracking has received an increased attention due to the rapid developments in tracking hardware (video boards, digital camera and CPU's) [Hansen and Pece 2005; OpenEyes 2005]. We present a gaze typing system based on components that can be bought in most consumer hardware stores around the world. These components are for example cameras and graphics cards that are made in large quantities. This kind of hardware differs from what is often claimed to be "off-the-shelf components", but which in fact is hardware only available from particular vendors.Institutions that supply citizens with communication aids may be reluctant to invest large amounts of money in new equipment that they are unfamiliar with. Recent investiagtions estimate that less than 2000 systems have actually been used by Europeans, even though more than half a million disabled people in Europe could potentially benefit from it. The main group of present users consists of people with motor neuron disease (MND) and amyotrophic lateral sclerosis (ALS). If the price of gaze communication systems can be lowered, it could become a preferred means of control for a large group of people [Jordansen et al. 2005]. Present commercial gaze trackers e.g. [Tobii 2005; LC-Technologies 2004] are easy to use, robust and sufficiently accurate for many screen-based applications but their costs exceed the budget of most people.We use a standard uncalibrated 400$ Sony consumer camera (Sony handycam DCR-HC14E) to obtain the image data. The camera is stationary and placed on a tripod close (variable) to the monitor, but the geometry of the user, monitor and camera varies among sequences. However, the users are sitting about 50 - 60 cm away from a 17" screen. A typical example of the setup is shown in figure 1. We use Sony standard video option for 'night vision' to create an glint with the build-in IR light emitter.Eye tracking based on common components is subject to several unknown factors as various system parameters (i.e. camera parameters and geometry) are unknown. Algorithms that employ robust statistical principles to accommodate uncertainties in image data as well as in gaze estimates in the typing process are therefore needed. We propose to use the RANSAC algorithm [Fischler and Bolles 1981] for both robust maximum likelihood estimation of iris observations [Hansen and Pece 2005] as well as for handling outliers in the calibration procedure [Morimoto et al. 2000].Our low-resolution gaze tracker can be calibrated in less than 3 minutes by looking at 9 predefined positions on the screen. The users sit on a standard office chair without headrests or other physical constraints. Under these conditions we have succeeded in tracking the gaze of people, obtaining accuracies about 160 pixels on screen. This is still less than accuracies claimed by the best current off-the-shelf eye trackers systems (i.e. 30-60 pixels). However comparing these eye trackers wouldn't be correct as they are based on different hardware and image data.Low-cost gaze trackers do not need to be as accurate and robust as the commercial systems, if they are used together with applications designed to tolerate noisy inputs.We use the GazeTalk [COGAIN 2005] typing communication system components and have through proper design of the typing interface, reduced the need for high accuracy. We have observed typing speeds in the range of 3 - 5 words per minute for untrained subjects using large on-screen buttons and a new noise tolerant dwell-time principle. We modify the traditional dwell-time activation to one that maintains a full distribution of all hypothetical button selections and then activate one button when the evidence become high enough.