Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... ·...
Transcript of Gaze interaction (2): models and technologieshomes.di.unimi.it/~boccignone/GiuseppeBoccignone... ·...
Gaze interaction (2):
models and technologies
Corso di Interazione uomo-macchina II
Prof. Giuseppe Boccignone
Dipartimento di Scienze dell’Informazione
Università di Milano
[email protected]://homes.dsi.unimi.it/~boccignone/l
A. Vinciarelli, M. Pantic, H. Bourlard, Social Signal Processing: Survey of an Emerging Domain,Image and Vision Computing (2008)
Gaze interaction
Gaze estimation without eye trackers
• Problem!
• Eye detection
• detect the existence of eyes
• accurately interpret eye positions in the images
• using the pupil or iris center.
• for video images, the detected eyes are tracked from frame to frame.
• Gaze estimation : detected eyes in the images used to estimate and track
where a person is looking in 3D, or alternatively, determining the 3D line of
sight.
Gaze estimation without eye trackers
Eye detection
//eye models
• Identify a model of the eye which is sufficiently expressive to take account of
large variability in the appearance and dynamics, while also sufficiently
constrained to be computationally efficient
• Even for the same subject, a relatively small variation in viewing angles can
cause significant changes in appearance
Eyelids may appear straight from one
view but highly curved from another.
The iris contour also changes with
viewing angle.
The dashed lines indicate when the
eyelids appear straight
the solid yellow lines represent the
major axis of the iris ellipse
Eye detection
//eye models
• The eye image may be characterized by
• the intensity distribution of the pupil(s), iris, and cornea,
• their shapes.
• Ethnicity, viewing angle, head pose, color, texture, light conditions, the
position of the iris within the eye socket, and the state of the eye (i.e., open/
close) are issues that heavily influence the appearance of the eye.
• The intended application and available image data lead to different prior eye
models.
• The prior model representation is often applied at different positions,
orientations, and scales to reject false candidates
Eye detection
//eye models
• Shape-based methods: use a prior model of eye shape and surrounding
structures
• fixed shape
• deformable shape
• Appearance-based methods: rely on models built directly on the appearance
of the eye region: template matching by constructing an image patch model
and performing eye detection through model matching using a similarity
measure
• intensity-based methods
• subspace-based methods
• Hybrid methods: combine feature, shape, and appearance approaches to
exploit their respective benefits
Eye detection
//eye models: Shape-Based Approaches
• Shape-based methods: use a prior model of eye shape and and a similarity
measure
• Prior model of eye shape and surrounding structures
• iris and pupil contours and the exterior shape of the eye (eyelids)
• simple elliptical or of a more complex nature
• parameters of the geometric model define the allowable template deformations
and contain parameters for rigid (similarity) transformations and parameters for
nonrigid template deformations
• ability to handle shape, scale, and rotation changes
Eye detection
//eye models: Shape-Based Approaches
• Simple Elliptical Shape Models:
• example: Valenti and Gevers
• uses isophote (i.e., curves connecting points of equal intensity) properties to infer the
center of (semi)circular patterns which represent the eyes
Eye detection
//eye models: Shape-Based Approaches
• Simple Elliptical Shape Models:
Eye detection
//eye models: Shape-Based Approaches
• Simple Elliptical Shape Models:
Eye detection
//eye models: Shape-Based Approaches
• Simple Elliptical Shape Models:
• example: Webcam-based Visual Gaze Estimation (Valenti et al)
• uses isophote (i.e., curves connecting points of equal intensity) no head pose
voting
Direction to the
center
Eye detection
//eye models: Shape-Based Approaches
• Simple Elliptical Shape Models:
• example: Webcam-based Visual Gaze Estimation (Valenti et al)
• uses isophote (i.e., curves connecting points of equal intensity) no head pose
Eye detection
//eye models: Shape-Based Approaches
• Simple Elliptical Shape Models:
• example: Webcam-based Visual Gaze Estimation (Valenti et al)
• uses isophote (i.e., curves connecting points of equal intensity) no head pose
Eye detection
//eye models: Shape-Based Approaches
• Simple Elliptical Shape Models:
• example: Webcam-based Visual Gaze Estimation (Valenti et al)
• uses isophote (i.e., curves connecting points of equal intensity) no head pose
Eye detection
//eye models: Shape-Based Approaches
• Simple Elliptical Shape Models:
• example: Webcam-based Visual Gaze Estimation (Valenti et al)
• uses scale space framework for multiresolution
Eye detection
//eye models: Shape-Based Approaches
• Simple Elliptical Shape Models:
• example: Webcam-based Visual Gaze Estimation (Valenti et al)
• simple interpolants for easy calibration
Eye detection
//eye models: Shape-Based Approaches
• Complex Shape Models:
• example: Yuille deformable templates
Eye detection
//eye models: Shape-Based Approaches
• Complex Shape Models:
• example: Yuille deformable templates
Eye detection
//eye models: Shape-Based Approaches
• Complex Shape Models:
• example: Yuille deformable templates
Eye detection
//eye models: Shape-Based Approaches
• Complex Shape Models:
• example: Yuille deformable templates
Eye detection
//eye models: Shape-Based Approaches
• Complex Shape Models:
• 1. computationally demanding,
• 2. may require high contrast images, and
• 3. usually need to be initialized close to the eye for successful localization. For
large head movements, they consequently need other methods to provide agood
initialization
Eye detection
//eye models: Feature-Based Shape Methods
• Explore the characteristics of the human eye to identify a set of distinctive
features around the eyes.
• The limbus, pupil (dark/bright pupil images), and cornea reflections are
common features used for eye localization
• Local Features by Intensity
• The eye region contains several boundaries that may bedetected by gray-level
differences
• Local Feature by Filter Responses
• Filter responses enhance particular characteristics in the image while suppressing
others. A filter bank may therefore enhance desired features of the image and, if
appropriately defined, deemphasize irrelevant features
Eye detection
//eye models: Feature-Based Shape Methods
• Local Features by Intensity
• The eye region contains several boundaries that may be detected by gray-level
differences
Eye detection
//eye models: Feature-Based Shape Methods
• Local Features by Intensity
• The eye region contains several boundaries that may be detected by gray-level
differences (Harper et al.)
Eye detection
//eye models: Feature-Based Shape Methods
• Local Features by Intensity
• The eye region contains several boundaries that may be detected by gray-level
differences
Sequential search strategy
Eye detection
//eye models: Feature-Based Shape Methods
• Local Features by Intensity
• The eye region contains several boundaries that may be detected by gray-level
differences
Eye detection
//eye models: Feature-Based Shape Methods
• Local Features by Intensity
• The eye region contains several boundaries that may be detected by gray-level
differences
Eye detection
//eye models: Feature-Based Shape Methods
• Local Feature by Filter Responses
• Filter responses enhance particular characteristics in the image while suppressing
others
• Example Sirohey and Rosenfeld:
• Edges of the eye’s sclera are detected with four Gabor wavelets. A nonlinear filter is
constructed to detect the left and right eye corner candidates.
• The eye corners are used to determine eye regions for further analysis. Postprocessing
steps are employed to eliminate the spurious eye corner candidates.
• A voting method is used to locate the edge of the iris. Since the upper part of the iris may
not be visible, the votes are accumulated by summing edge pixels in a U-shaped annular
region. The annulus center receiving the most votes is selected as the iris center
• To detect the edge of the upper eyelid, all edge segments are examined in the eye region
and fitted to a third-degree polynomial
Eye detection
//eye models: Feature-Based Shape Methods
• Local Feature by Filter Responses
• Filter responses enhance particular characteristics in the image while suppressing
others
• Example Sirohey and Rosenfeld:
Eye detection
//eye models: Feature-Based Shape Methods
• Local Feature by Filter Responses
• Filter responses enhance particular characteristics in the image while suppressing
others
• Example Sirohey and Rosenfeld:
Eye detection
//eye models: Feature-Based Shape Methods
• Local Feature by Filter Responses
• Filter responses enhance particular characteristics in the image while suppressing
others
• Example Sirohey and Rosenfeld:
Eye detection
//eye models: Feature-Based Shape Methods
• Local Feature by Filter Responses
• Filter responses enhance particular characteristics in the image while suppressing
others
• Example Sirohey and Rosenfeld:
Eye detection
//eye models
• Appearance-based methods: rely on models built directly on the appearance
of the eye region: template matching by constructing an image patch model
and performing eye detection through model matching using a similarity
measure
• intensity-based methods
• subspace-based methods
Eye detection
//eye models
• Appearance-based methods: rely on
models built directly on the
appearance of the eye region:
template matching by constructing
an image patch model and
performing eye detection through
model matching using a similarity
measure
• intensity-based methods ( example
Grauman et al)
• During the first stage of processing,
the eyes are automatically located
by searching temporally for "blink-
like" motion
Eye detection
//eye models
• Appearance-based methods: rely on models built directly on the appearance
of the eye region: template matching by constructing an image patch model
and performing eye detection through model matching using a similarity
measure
• intensity-based methods ( example Grauman et al)
• During the first stage of processing, the eyes are automatically located by searching
temporally for "blink-like" motion
Eye detection
//eye models
• Appearance-based methods: rely on models built directly on the appearance
of the eye region: template matching by constructing an image patch model
and performing eye detection through model matching using a similarity
measure
• intensity-based methods ( example Grauman et al)
• During the first stage of processing, the eyes are automatically located by searching
temporally for "blink-like" motion
Eye detection
//eye models
• Appearance-based methods: rely on models built directly on the appearance
of the eye region: template matching by constructing an image patch model
and performing eye detection through model matching using a similarity
measure
• intensity-based methods ( example Grauman et al)
Eye detection
//eye models
• Appearance-based methods: rely on models built directly on the appearance
of the eye region: template matching by constructing an image patch model
and performing eye detection through model matching using a similarity
measure
• intensity-based methods ( example Grauman et al)
Eye detection
//eye models
• Appearance-based methods: rely on models built directly on the appearance
of the eye region: template matching by constructing an image patch model
and performing eye detection through model matching using a similarity
measure
• subspace methods (eigeneyes)
Eye detection
//eye models
• Appearance-based methods: rely on models built directly on the appearance
of the eye region: template matching by constructing an image patch model
and performing eye detection through model matching using a similarity
measure
• subspace methods (eigeneyes)
• How can we find an efficient representation of such a data set?
• Rather, than storing every image, we might try to represent the images more effectively,
e.g., in a lower dimensional subspace
• We seek a linear basis with which each image in the ensemble is approximatedas a linear
combination of basis images
Eye detection
//eye models
• Appearance-based methods: rely on models built directly on the appearance
of the eye region: template matching by constructing an image patch model
and performing eye detection through model matching using a similarity
measure
• subspace methods (eigeneyes)
• How can we find an efficient representation of such a data set?
• Rather, than storing every image, we might try to represent the images more effectively,
e.g., in a lower dimensional subspace
• let’s select the basis to minimize squared reconstruction error
Eye detection
//eye models
• Appearance-based methods: rely on models built directly on the appearance
of the eye region: template matching by constructing an image patch model
and performing eye detection through model matching using a similarity
measure
• subspace methods (eigeneyes)
• How can we find an efficient representation of such a data set?
• Rather, than storing every image, we might try to represent the images more effectively,
e.g., in a lower dimensional subspace
• The eigenvectors of the sample covariance matrix of the image data provide the major axis
Eye detection
//eye models
• Appearance-based methods: rely on models built directly on the appearance
of the eye region: template matching by constructing an image patch model
and performing eye detection through model matching using a similarity
measure
• subspace methods (eigeneyes)
Eye detection
//eye models
• Appearance-based methods: rely on models built directly on the appearance
of the eye region: template matching by constructing an image patch model
and performing eye detection through model matching using a similarity
measure
• subspace methods (eigeneyes)
Eye detection
//in summary...............
• Shape-based methods: use a prior model of eye shape and surrounding structures
• fixed shape
• deformable shape
• Appearance-based methods: rely on models built directly on the appearance of the eye
region: template matching by constructing an image patch model and performing eye
detection through model matching using a similarity measure
• intensity-based methods
• subspace-based methods
• Hybrid methods: combine feature, shape, and appearance approaches to exploit their
respective benefits
• Other methods: eye trackers: active light (IR)......we have already considered these
Gaze estimation
• Gaze:
• the gaze direction
• the point of regard (PoR or fixation)
• Gaze modeling consequently focuses on the relations between the image
data and the point of regard/gaze direction.
Gaze estimation
//some general problems
• 1. camera-calibration: determining
intrinsic camera parameters;
• 2. geometric-calibration:
determining relative locations and
orientations of different units in the
setup such as camera, light
sources, and monitor;
• 3. personal calibration: estimating
cornea curvature, angular offset
between visual and optical axes;
and
• 4. gazing mapping calibration:
determining parameters of the eye-
gaze mapping functions.
Gaze estimation
//methods
• IR light and feature extraction:
• 2D Regression-Based Gaze Estimation
• 3D Model-Based Gaze Estimation
• Appearance based methods
• Similarly to the appearance models of the eyes, appearance-based models for
gaze estimation do not explicitly extract features, but rather use the image
contents as input with thei ntention of mapping these directly to screen
coordinates (PoR).
• do not require calibration of cameras and geometry data since the mapping is
made directly on the image contents
• Natural light methods
•
Gaze estimation
//methods
• IR light and feature extraction:
• 2D Regression-Based Gaze Estimation
• 3D Model-Based Gaze Estimation
• Appearance based methods
• Similarly to the appearance models of the eyes, appearance-based models for gaze
estimation do not explicitly extract features, but rather use the image contents as input
with thei ntention of mapping these directly to screen coordinates (PoR).
• do not require calibration of cameras and geometry data since the mapping is made
directly on the image contents
• Natural light methods
• Natural light approaches face several new challenges such as light changes in the
visible spectrum, lower contrast images, but are not as sensitive to the IR light in the
environment, and may thus, be potentially better suited when used outdoor
Gaze estimation
//methods
• Appearance based methods
• Example: K.-H. Tan, D.J. Kriegman, and N. Ahuja,: appearance manifold model
• treat an image as a point in a high-dimensional space: a 20 pixel by 20 pixel intensity image
can be considered a 400-component vector, or a point in a 400-dimensional space
(appearance manifold)
each manifold point s
is an image of an
eye, labeled with a 2D coordinate of a point on a display
s1
s2
s3
Gaze estimation
//methods
• Appearance based methods
• Example: K.-H. Tan, D.J. Kriegman, and N. Ahuja,: appearance manifold model
• treat an image as a point in a high-dimensional space: a 20 pixel by 20 pixel intensity image
can be considered a 400-component vector, or a point in a 400-dimensional space
(appearance manifold)
each manifold point s
is an image of an
eye, labeled with a 2D coordinate of a point on a display
s1
s2
s3
Manifold
Learning
Gaze estimation
//methods
• Appearance based methods
• Example: K.-H. Tan, D.J. Kriegman, and N. Ahuja,: appearance manifold model
• treat an image as a point in a high-dimensional space: a 20 pixel by 20 pixel intensity image
can be considered a 400-component vector, or a point in a 400-dimensional space
(appearance manifold)
Gaze estimation
//methods
• Appearance based methods
• Example: William Blake & Cipolla: mapping images to continuous output spaces using
powerful Bayesian learning techniques
Gaze estimation
//methods
• Appearance based methods
• Example: William Blake & Cipolla: mapping images to continuous output spaces using
powerful Bayesian learning techniques
calibration
Gaze estimation
//methods
• Example: William Blake & Cipolla: mapping images to continuous output spaces using
powerful Bayesian learning techniques
• Rather than using raw pixel data, input images are processed to obtain different types of
feature
• To infer the input–output mapping for unseen inputs in real-time: sparse regression
model (Gaussian Processes)
• Method is fully Bayesian: output predictions are provided with a measure of uncertainty
• During the learning phase, all unknown modelling parameters are inferred from data as
part of the Bayesian framework: do not require known dynamics a priori.
Gaze estimation
//methods
• Appearance based methods
• Example: William Blake & Cipolla: mapping images to continuous output spaces using
powerful Bayesian learning techniques
• Can be applied to other contexts
Gaze estimation
//methods
• Appearance based methods
• Example: William Blake & Cipolla: mapping images to continuous output spaces using
powerful Bayesian learning techniques
• Can be applied to other contexts
Gaze estimation
//using other cues
Gaze estimation
//head-tracking
• The Watson head-tracker
• real-time object tracker uses range and appearance
information from a stereo camera to recover the 3D
rotation and translation of objects, or of the camera itself.
• The system can be connected to a face detector and
used as an accurate head tracker.
• Additional supporting algorithms can improve the
accuracy of the tracker
• Software download
• http://groups.csail.mit.edu/vision/vip/watson/index.htm
The Watson head tracker
The Watson head tracker
//head pointing
The Watson head tracker,
//Interactive Kiosk
Shared attention
• Shared attention through gaze interactions?
Shared attention
//Developmental timeline
• Mutual gaze
• Gaze following
Shared attention
• Imperative pointing
• Declarative pointing (create
shared attention)
Shared attention
Shared attention
//Open questions
Shared attention
//Models (B.Scassellati, MIT)
Shared attention
//Models (B.Scassellati, MIT)
Shared attention
//Robots that Learn to Converse:
Shared attention
//Robots that Learn to Converse:
Shared attention
//Robots that Learn to Converse: