Understanding Speech

Chasing gazes

The eyes have often been described as a window into the human mind. When we hear spoken language about things present in our immediate surroundings, our gaze (largely automatically) searches for items or persons that speakers refer to. When we speak to describe something that we see, our eyes fixate upon parts of what we perceive as our description unfolds. Since where we look is tightly linked to processing linguistic information about what we see, tracking eye movements has become a well-established tool for psycholinguists to learn more about the time course of listening and speaking.

When looking at a scene, the eyes regularly perform jumps, so-called saccades, between fixations, when the gaze rests upon a particular point. During a saccade, we do not perceive anything. What we perceive of as a stable image of our surroundings is basically an image computed in our brain from the multiple rests or fixations.

Our eye tracking lab is located at the "Haus des Hörens" audiological research center. We use an SR Research EyeLink CL Remote eye tracking device that can record eye movements and fixations at a speed of up to 1000 Hz. The device can be used with a headrest and in remote mode. Tracking the gaze of a subject with this system works through a high-speed image analysis analysis of an infrared video stream. The device consists of three parts: an infrared camera pointed at the subject, an infrared light source, and the computer running the analysis software. In order to estimate the position of the gaze, the system keeps track of the elliptical shape of the pupil. The infrared light illuminating the subject creates a characteristic reflection on the cornea, which is also recorded. After an initial calibration, the pupil position and shape information together with the position of the corneal reflection allow for a precise mapping of gaze onto the stimulus picture or scene shown to the subject.

The setup is currently used for language perception experiments where subjects listen to speech of varying complexity under different noise conditions. We show the subjects b/w scene drawings from our OLACS corpus that either match or do not match the sentence they hear and track the course of fixations on the visual display. In a different experiment we are looking for the distribution of gazes across depicted scenes while the subjects are describing the scene verbally.