Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... ·...

37
Gaze interaction (2): models and technologies Corso di Interazione uomo-macchina II Prof. Giuseppe Boccignone Dipartimento di Scienze dell’Informazione Università di Milano [email protected] http://homes.dsi.unimi.it/~boccignone/l A. Vinciarelli, M. Pantic, H. Bourlard, Social Signal Processing: Survey of an Emerging Domain, Image and Vision Computing (2008) Gaze interaction

Transcript of Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... ·...

Page 1: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Gaze interaction (2):

models and technologies

Corso di Interazione uomo-macchina II

Prof. Giuseppe Boccignone

Dipartimento di Scienze dell’Informazione

Università di Milano

[email protected]://homes.dsi.unimi.it/~boccignone/l

A. Vinciarelli, M. Pantic, H. Bourlard, Social Signal Processing: Survey of an Emerging Domain,Image and Vision Computing (2008)

Gaze interaction

Page 2: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Gaze estimation without eye trackers

• Problem!

• Eye detection

• detect the existence of eyes

• accurately interpret eye positions in the images

• using the pupil or iris center.

• for video images, the detected eyes are tracked from frame to frame.

• Gaze estimation : detected eyes in the images used to estimate and track

where a person is looking in 3D, or alternatively, determining the 3D line of

sight.

Gaze estimation without eye trackers

Page 3: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models

• Identify a model of the eye which is sufficiently expressive to take account of

large variability in the appearance and dynamics, while also sufficiently

constrained to be computationally efficient

• Even for the same subject, a relatively small variation in viewing angles can

cause significant changes in appearance

Eyelids may appear straight from one

view but highly curved from another.

The iris contour also changes with

viewing angle.

The dashed lines indicate when the

eyelids appear straight

the solid yellow lines represent the

major axis of the iris ellipse

Eye detection

//eye models

• The eye image may be characterized by

• the intensity distribution of the pupil(s), iris, and cornea,

• their shapes.

• Ethnicity, viewing angle, head pose, color, texture, light conditions, the

position of the iris within the eye socket, and the state of the eye (i.e., open/

close) are issues that heavily influence the appearance of the eye.

• The intended application and available image data lead to different prior eye

models.

• The prior model representation is often applied at different positions,

orientations, and scales to reject false candidates

Page 4: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models

• Shape-based methods: use a prior model of eye shape and surrounding

structures

• fixed shape

• deformable shape

• Appearance-based methods: rely on models built directly on the appearance

of the eye region: template matching by constructing an image patch model

and performing eye detection through model matching using a similarity

measure

• intensity-based methods

• subspace-based methods

• Hybrid methods: combine feature, shape, and appearance approaches to

exploit their respective benefits

Eye detection

//eye models: Shape-Based Approaches

• Shape-based methods: use a prior model of eye shape and and a similarity

measure

• Prior model of eye shape and surrounding structures

• iris and pupil contours and the exterior shape of the eye (eyelids)

• simple elliptical or of a more complex nature

• parameters of the geometric model define the allowable template deformations

and contain parameters for rigid (similarity) transformations and parameters for

nonrigid template deformations

• ability to handle shape, scale, and rotation changes

Page 5: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models: Shape-Based Approaches

• Simple Elliptical Shape Models:

• example: Valenti and Gevers

• uses isophote (i.e., curves connecting points of equal intensity) properties to infer the

center of (semi)circular patterns which represent the eyes

Eye detection

//eye models: Shape-Based Approaches

• Simple Elliptical Shape Models:

Page 6: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models: Shape-Based Approaches

• Simple Elliptical Shape Models:

Eye detection

//eye models: Shape-Based Approaches

• Simple Elliptical Shape Models:

• example: Webcam-based Visual Gaze Estimation (Valenti et al)

• uses isophote (i.e., curves connecting points of equal intensity) no head pose

voting

Direction to the

center

Page 7: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models: Shape-Based Approaches

• Simple Elliptical Shape Models:

• example: Webcam-based Visual Gaze Estimation (Valenti et al)

• uses isophote (i.e., curves connecting points of equal intensity) no head pose

Eye detection

//eye models: Shape-Based Approaches

• Simple Elliptical Shape Models:

• example: Webcam-based Visual Gaze Estimation (Valenti et al)

• uses isophote (i.e., curves connecting points of equal intensity) no head pose

Page 8: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models: Shape-Based Approaches

• Simple Elliptical Shape Models:

• example: Webcam-based Visual Gaze Estimation (Valenti et al)

• uses isophote (i.e., curves connecting points of equal intensity) no head pose

Eye detection

//eye models: Shape-Based Approaches

• Simple Elliptical Shape Models:

• example: Webcam-based Visual Gaze Estimation (Valenti et al)

• uses scale space framework for multiresolution

Page 9: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models: Shape-Based Approaches

• Simple Elliptical Shape Models:

• example: Webcam-based Visual Gaze Estimation (Valenti et al)

• simple interpolants for easy calibration

Eye detection

//eye models: Shape-Based Approaches

• Complex Shape Models:

• example: Yuille deformable templates

Page 10: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models: Shape-Based Approaches

• Complex Shape Models:

• example: Yuille deformable templates

Eye detection

//eye models: Shape-Based Approaches

• Complex Shape Models:

• example: Yuille deformable templates

Page 11: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models: Shape-Based Approaches

• Complex Shape Models:

• example: Yuille deformable templates

Eye detection

//eye models: Shape-Based Approaches

• Complex Shape Models:

• 1. computationally demanding,

• 2. may require high contrast images, and

• 3. usually need to be initialized close to the eye for successful localization. For

large head movements, they consequently need other methods to provide agood

initialization

Page 12: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models: Feature-Based Shape Methods

• Explore the characteristics of the human eye to identify a set of distinctive

features around the eyes.

• The limbus, pupil (dark/bright pupil images), and cornea reflections are

common features used for eye localization

• Local Features by Intensity

• The eye region contains several boundaries that may bedetected by gray-level

differences

• Local Feature by Filter Responses

• Filter responses enhance particular characteristics in the image while suppressing

others. A filter bank may therefore enhance desired features of the image and, if

appropriately defined, deemphasize irrelevant features

Eye detection

//eye models: Feature-Based Shape Methods

• Local Features by Intensity

• The eye region contains several boundaries that may be detected by gray-level

differences

Page 13: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models: Feature-Based Shape Methods

• Local Features by Intensity

• The eye region contains several boundaries that may be detected by gray-level

differences (Harper et al.)

Eye detection

//eye models: Feature-Based Shape Methods

• Local Features by Intensity

• The eye region contains several boundaries that may be detected by gray-level

differences

Sequential search strategy

Page 14: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models: Feature-Based Shape Methods

• Local Features by Intensity

• The eye region contains several boundaries that may be detected by gray-level

differences

Eye detection

//eye models: Feature-Based Shape Methods

• Local Features by Intensity

• The eye region contains several boundaries that may be detected by gray-level

differences

Page 15: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models: Feature-Based Shape Methods

• Local Feature by Filter Responses

• Filter responses enhance particular characteristics in the image while suppressing

others

• Example Sirohey and Rosenfeld:

• Edges of the eye’s sclera are detected with four Gabor wavelets. A nonlinear filter is

constructed to detect the left and right eye corner candidates.

• The eye corners are used to determine eye regions for further analysis. Postprocessing

steps are employed to eliminate the spurious eye corner candidates.

• A voting method is used to locate the edge of the iris. Since the upper part of the iris may

not be visible, the votes are accumulated by summing edge pixels in a U-shaped annular

region. The annulus center receiving the most votes is selected as the iris center

• To detect the edge of the upper eyelid, all edge segments are examined in the eye region

and fitted to a third-degree polynomial

Eye detection

//eye models: Feature-Based Shape Methods

• Local Feature by Filter Responses

• Filter responses enhance particular characteristics in the image while suppressing

others

• Example Sirohey and Rosenfeld:

Page 16: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models: Feature-Based Shape Methods

• Local Feature by Filter Responses

• Filter responses enhance particular characteristics in the image while suppressing

others

• Example Sirohey and Rosenfeld:

Eye detection

//eye models: Feature-Based Shape Methods

• Local Feature by Filter Responses

• Filter responses enhance particular characteristics in the image while suppressing

others

• Example Sirohey and Rosenfeld:

Page 17: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models: Feature-Based Shape Methods

• Local Feature by Filter Responses

• Filter responses enhance particular characteristics in the image while suppressing

others

• Example Sirohey and Rosenfeld:

Eye detection

//eye models

• Appearance-based methods: rely on models built directly on the appearance

of the eye region: template matching by constructing an image patch model

and performing eye detection through model matching using a similarity

measure

• intensity-based methods

• subspace-based methods

Page 18: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models

• Appearance-based methods: rely on

models built directly on the

appearance of the eye region:

template matching by constructing

an image patch model and

performing eye detection through

model matching using a similarity

measure

• intensity-based methods ( example

Grauman et al)

• During the first stage of processing,

the eyes are automatically located

by searching temporally for "blink-

like" motion

Eye detection

//eye models

• Appearance-based methods: rely on models built directly on the appearance

of the eye region: template matching by constructing an image patch model

and performing eye detection through model matching using a similarity

measure

• intensity-based methods ( example Grauman et al)

• During the first stage of processing, the eyes are automatically located by searching

temporally for "blink-like" motion

Page 19: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models

• Appearance-based methods: rely on models built directly on the appearance

of the eye region: template matching by constructing an image patch model

and performing eye detection through model matching using a similarity

measure

• intensity-based methods ( example Grauman et al)

• During the first stage of processing, the eyes are automatically located by searching

temporally for "blink-like" motion

Eye detection

//eye models

• Appearance-based methods: rely on models built directly on the appearance

of the eye region: template matching by constructing an image patch model

and performing eye detection through model matching using a similarity

measure

• intensity-based methods ( example Grauman et al)

Page 20: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models

• Appearance-based methods: rely on models built directly on the appearance

of the eye region: template matching by constructing an image patch model

and performing eye detection through model matching using a similarity

measure

• intensity-based methods ( example Grauman et al)

Eye detection

//eye models

• Appearance-based methods: rely on models built directly on the appearance

of the eye region: template matching by constructing an image patch model

and performing eye detection through model matching using a similarity

measure

• subspace methods (eigeneyes)

Page 21: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models

• Appearance-based methods: rely on models built directly on the appearance

of the eye region: template matching by constructing an image patch model

and performing eye detection through model matching using a similarity

measure

• subspace methods (eigeneyes)

• How can we find an efficient representation of such a data set?

• Rather, than storing every image, we might try to represent the images more effectively,

e.g., in a lower dimensional subspace

• We seek a linear basis with which each image in the ensemble is approximatedas a linear

combination of basis images

Eye detection

//eye models

• Appearance-based methods: rely on models built directly on the appearance

of the eye region: template matching by constructing an image patch model

and performing eye detection through model matching using a similarity

measure

• subspace methods (eigeneyes)

• How can we find an efficient representation of such a data set?

• Rather, than storing every image, we might try to represent the images more effectively,

e.g., in a lower dimensional subspace

• let’s select the basis to minimize squared reconstruction error

Page 22: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models

• Appearance-based methods: rely on models built directly on the appearance

of the eye region: template matching by constructing an image patch model

and performing eye detection through model matching using a similarity

measure

• subspace methods (eigeneyes)

• How can we find an efficient representation of such a data set?

• Rather, than storing every image, we might try to represent the images more effectively,

e.g., in a lower dimensional subspace

• The eigenvectors of the sample covariance matrix of the image data provide the major axis

Eye detection

//eye models

• Appearance-based methods: rely on models built directly on the appearance

of the eye region: template matching by constructing an image patch model

and performing eye detection through model matching using a similarity

measure

• subspace methods (eigeneyes)

Page 23: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Eye detection

//eye models

• Appearance-based methods: rely on models built directly on the appearance

of the eye region: template matching by constructing an image patch model

and performing eye detection through model matching using a similarity

measure

• subspace methods (eigeneyes)

Eye detection

//in summary...............

• Shape-based methods: use a prior model of eye shape and surrounding structures

• fixed shape

• deformable shape

• Appearance-based methods: rely on models built directly on the appearance of the eye

region: template matching by constructing an image patch model and performing eye

detection through model matching using a similarity measure

• intensity-based methods

• subspace-based methods

• Hybrid methods: combine feature, shape, and appearance approaches to exploit their

respective benefits

• Other methods: eye trackers: active light (IR)......we have already considered these

Page 24: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Gaze estimation

• Gaze:

• the gaze direction

• the point of regard (PoR or fixation)

• Gaze modeling consequently focuses on the relations between the image

data and the point of regard/gaze direction.

Gaze estimation

//some general problems

• 1. camera-calibration: determining

intrinsic camera parameters;

• 2. geometric-calibration:

determining relative locations and

orientations of different units in the

setup such as camera, light

sources, and monitor;

• 3. personal calibration: estimating

cornea curvature, angular offset

between visual and optical axes;

and

• 4. gazing mapping calibration:

determining parameters of the eye-

gaze mapping functions.

Page 25: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Gaze estimation

//methods

• IR light and feature extraction:

• 2D Regression-Based Gaze Estimation

• 3D Model-Based Gaze Estimation

• Appearance based methods

• Similarly to the appearance models of the eyes, appearance-based models for

gaze estimation do not explicitly extract features, but rather use the image

contents as input with thei ntention of mapping these directly to screen

coordinates (PoR).

• do not require calibration of cameras and geometry data since the mapping is

made directly on the image contents

• Natural light methods

Gaze estimation

//methods

• IR light and feature extraction:

• 2D Regression-Based Gaze Estimation

• 3D Model-Based Gaze Estimation

• Appearance based methods

• Similarly to the appearance models of the eyes, appearance-based models for gaze

estimation do not explicitly extract features, but rather use the image contents as input

with thei ntention of mapping these directly to screen coordinates (PoR).

• do not require calibration of cameras and geometry data since the mapping is made

directly on the image contents

• Natural light methods

• Natural light approaches face several new challenges such as light changes in the

visible spectrum, lower contrast images, but are not as sensitive to the IR light in the

environment, and may thus, be potentially better suited when used outdoor

Page 26: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Gaze estimation

//methods

• Appearance based methods

• Example: K.-H. Tan, D.J. Kriegman, and N. Ahuja,: appearance manifold model

• treat an image as a point in a high-dimensional space: a 20 pixel by 20 pixel intensity image

can be considered a 400-component vector, or a point in a 400-dimensional space

(appearance manifold)

each manifold point s

is an image of an

eye, labeled with a 2D coordinate of a point on a display

s1

s2

s3

Gaze estimation

//methods

• Appearance based methods

• Example: K.-H. Tan, D.J. Kriegman, and N. Ahuja,: appearance manifold model

• treat an image as a point in a high-dimensional space: a 20 pixel by 20 pixel intensity image

can be considered a 400-component vector, or a point in a 400-dimensional space

(appearance manifold)

each manifold point s

is an image of an

eye, labeled with a 2D coordinate of a point on a display

s1

s2

s3

Manifold

Learning

Page 27: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Gaze estimation

//methods

• Appearance based methods

• Example: K.-H. Tan, D.J. Kriegman, and N. Ahuja,: appearance manifold model

• treat an image as a point in a high-dimensional space: a 20 pixel by 20 pixel intensity image

can be considered a 400-component vector, or a point in a 400-dimensional space

(appearance manifold)

Gaze estimation

//methods

• Appearance based methods

• Example: William Blake & Cipolla: mapping images to continuous output spaces using

powerful Bayesian learning techniques

Page 28: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Gaze estimation

//methods

• Appearance based methods

• Example: William Blake & Cipolla: mapping images to continuous output spaces using

powerful Bayesian learning techniques

calibration

Gaze estimation

//methods

• Example: William Blake & Cipolla: mapping images to continuous output spaces using

powerful Bayesian learning techniques

• Rather than using raw pixel data, input images are processed to obtain different types of

feature

• To infer the input–output mapping for unseen inputs in real-time: sparse regression

model (Gaussian Processes)

• Method is fully Bayesian: output predictions are provided with a measure of uncertainty

• During the learning phase, all unknown modelling parameters are inferred from data as

part of the Bayesian framework: do not require known dynamics a priori.

Page 29: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Gaze estimation

//methods

• Appearance based methods

• Example: William Blake & Cipolla: mapping images to continuous output spaces using

powerful Bayesian learning techniques

• Can be applied to other contexts

Gaze estimation

//methods

• Appearance based methods

• Example: William Blake & Cipolla: mapping images to continuous output spaces using

powerful Bayesian learning techniques

• Can be applied to other contexts

Page 30: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Gaze estimation

//using other cues

Gaze estimation

//head-tracking

• The Watson head-tracker

• real-time object tracker uses range and appearance

information from a stereo camera to recover the 3D

rotation and translation of objects, or of the camera itself.

• The system can be connected to a face detector and

used as an accurate head tracker.

• Additional supporting algorithms can improve the

accuracy of the tracker

• Software download

• http://groups.csail.mit.edu/vision/vip/watson/index.htm

Page 31: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

The Watson head tracker

The Watson head tracker

//head pointing

Page 32: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

The Watson head tracker,

//Interactive Kiosk

Shared attention

• Shared attention through gaze interactions?

Page 33: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Shared attention

//Developmental timeline

• Mutual gaze

• Gaze following

Shared attention

Page 34: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

• Imperative pointing

• Declarative pointing (create

shared attention)

Shared attention

Shared attention

//Open questions

Page 35: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Shared attention

//Models (B.Scassellati, MIT)

Shared attention

//Models (B.Scassellati, MIT)

Page 36: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Shared attention

//Robots that Learn to Converse:

Shared attention

//Robots that Learn to Converse:

Page 37: Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... · •2. may require high contrast images, and •3. usually need to be initialized

Shared attention

//Robots that Learn to Converse: