Cours parole du 9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

55
1 Cours parole du 9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet Reconnaissance du locuteur 1. Introduction, Historique, Domaines d’applications 2. Les indices de l’identité dans la parole 3. Vérification du locuteur 1. Théorie de la decision 2. Dépendante / Indépendante du texte 4. L’imposture vocale 5. Vérification audio-visuelle de l’identité 6. Evaluations 7. Conclusions

description

Cours parole du 9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet. Reconnaissance du locuteur Introduction, Historique, Domaines d’applications Les indices de l’identité dans la parole Vérification du locuteur Théorie de la decision - PowerPoint PPT Presentation

Transcript of Cours parole du 9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

Page 1: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

1

Cours parole du 9 Mars 2005enseignants: Dr. Dijana Petrovska-Delacrétaz

et Gérard Chollet

Reconnaissance du locuteur

1. Introduction, Historique, Domaines d’applications 2. Les indices de l’identité dans la parole3. Vérification du locuteur

1. Théorie de la decision2. Dépendante / Indépendante du texte

4. L’imposture vocale5. Vérification audio-visuelle de l’identité6. Evaluations7. Conclusions

Page 2: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

2

Why should a computer recognize who is speaking ?

• Protection of individual property (habitation, bank account, personal data, messages, mobile phone, PDA,...)

• Limited access (secured areas, data bases)• Personalization (only respond to its master’s voice)• Locate a particular person in an audio-visual document

(information retrieval)• Who is speaking in a meeting ?• Is a suspect the criminal ? (forensic applications)

Page 3: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

3

Tasks in Automatic Speaker Recognition

• Speaker verification (Voice Biometrics) Are you really who you claim to be ?

• Identification (Speaker ID) : Is this speech segment coming from a known speaker ? How large is the set of speakers (population of the world) ?

• Speaker detection, segmentation, indexing, retrieval, tracking : Looking for recordings of a particular speaker

• Combining Speech and Speaker Recognition Adaptation to a new speaker, speaker typology Personalization in dialogue systems

Page 4: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

4

Applications

• Access ControlPhysical facilities, Computer networks, Websites

• Transaction AuthenticationTelephone banking, e-Commerce

• Speech data ManagementVoice messaging, Search engines

• Law EnforcementForensics, Home incarceration

Page 5: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

5

Voice Biometric• Avantages

Often the only modality over the telephone,Low cost (microphone, A/D), UbiquityPossible integration on a smart (SIM) card Natural bimodal fusion : speaking face

• DisadvantagesLack of discretionPossibility of imitation and electronic impostureLack of robustness to noise, distortion,…Temporal drift

Page 6: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

6

Speaker Identity in Speech• Differences in

Vocal tract shapes and muscular controlFundamental frequency (typical values)

100 Hz (Male), 200 Hz (Female), 300 Hz (Child)Glottal waveformPhonotacticsLexical usage

• The differences between Voices of Twins is a limit case• Voices can also be imitated or disguised

Page 7: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

7

spectral envelope of / i: /

f

A

Speaker A

Speaker B

Speaker Identity

• segmental factors (~30ms) glottal excitation:

fundamental frequency, amplitude,voice quality (e.g., breathiness)

vocal tract:characterized by its transfer function and represented by MFCCs (Mel Freq. Cepstral Coef)

• suprasegmental factors speaking speed (timing and rhythm of speech units) intonation patterns dialect, accent, pronunciation habits

Page 8: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

8

What are the sources of difficulty ?

• Intra-speaker variability of the speech signal (due to stress, pathologies, environmental conditions,…)

• Recording conditions (filtering, noise,…)• Channel mismatch between enrolment and testing• Temporal drift• Intentional imposture• Voice disguise

Page 9: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

9

Acoustic features

• Short term spectral analysis

Page 10: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

10

Intra- and Inter-speaker variability

Page 11: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

11

Speaker Verification

Typology of approaches (EAGLES Handbook) Text dependent

Public password Private password Customized password Text prompted

Text independent Incremental enrolment Evaluation

Page 12: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

12

History of Speaker Recognition

Page 13: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

13

Current approaches

Page 14: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

14

Dynamic Time Warping (DTW)

Best path

),()Y,X( 2jid yx

“Bonjour” locuteur test Y

“Bon

jour

” loc

uteu

r X

“Bonjour” locuteur 1

“Bonjour” locuteur 2

“Bonjour” locuteur n

DODDINGTON 1974, ROSENBERG 1976, FURUI 1981, etc.

Page 15: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

15

Vector Quantization (VQ)

bestquant.

),()Y,X( X2

jiCd y

Dictionnaire locuteur 1

Dictionnaire locuteur 2

Dictionnaire locuteur n

“Bonjour” locuteur test Y

Dic

tionn

aire

locu

teur

X

SOONG, ROSENBERG 1987

Page 16: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

16

Hidden Markov Models (HMM)

Bestpath

)S(Plog)Y,X(iXjy

“Bonjour” locuteur 1

“Bonjour” locuteur 2

“Bonjour” locuteur n

“Bonjour” locuteur test Y

“Bon

jour

” loc

uteu

r X

ROSENBERG 1990, TSENG 1992

Page 17: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

17

Ergodic HMM

Best path

)S(Plog)Y,X(iXjy

HMM locuteur 1

HMM locuteur 2

HMM locuteur n

“Bonjour” locuteur test Y

HM

M lo

cute

ur X

PORITZ 1982, SAVIC 1990

Page 18: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

18

Gaussian Mixture Models (GMM)

REYNOLDS 1995

Page 19: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

19

HMM structure depends on the application

Page 20: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

20

Some issues in Text-dependent Speaker Verification Systems :

The CAVE and PICASSO projects• Sequences of digits

Speaker independent HMM of each digitAdaptation of these HMMs to the client voice (during

enrolment and incremental enrolment)EER of less than 1 % can be achieved

• Customized passwordThe client chooses his password using some feedback from

the system• Deliberate imposture

Page 21: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

21

Gaussian Mixture Model

• Parametric representation of the probability distribution of observations:

Page 22: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

22

Gaussian Mixture Models

8 Gaussians per mixture

Page 23: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

23

GMM speaker modeling

Front-end GMMMODELING

WORLDGMM

MODEL

Front-end GMM model adaptation

TARGETGMM

MODEL

Page 24: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

24

Baseline GMM method

HYPOTH.TARGET

GMM MOD.

Front-end

WORLDGMM

MODEL

Test Speech

xPxPLog ]

)/()/([

LLR SCORE

)/( xP

)/( xP

=

Page 25: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

25

• Two types of errors :False rejection (a client is rejected)False acceptation (an impostor is accepted)

• Decision theory : given an observation O and a claimed identityH0 hypothesis : it comes from an impostorH1 hypothesis : it comes from our client

• H1 is chosen if and only if P(H1|O) > P(H0|O) which could be rewritten (using Bayes law) as

Decision theory for identity verification

)1()(

)()1(

HPHoP

HoOPHOP

Page 26: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

26

Signal detection theory

Page 27: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

27

Decision

Page 28: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

28

Distribution of scores

Page 29: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

29

Detection Error Tradeoff (DET) Curve

Page 30: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

30

Evaluation

• Decision cost (FA, FR, priors, costs,…)• Receiver Operating Characteristic Curve• Reference systems (open software)• Evaluations (algorithms, field trials, ergonomy,…)

Page 31: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

31

NIST Speaker Verification Evaluations• A reference standard to compare algorithms and stimulate

new developments• Distribution (via LDC) of development and test databases

with :Increasing difficulty (from land line to mobile)Several hundreds of speakers (2 mn of training

data per client),Several thousands test accesses (5 to 50 sec per

access),• Participation of 15-20 labs every year (MIT, IBM, Nuance,

Queensland Univ, ELISA consortium,….)• Annual workshop, Special issues in Journals, …

Page 32: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

32

National Institute of Standards & Technology (NIST)Speaker Verification Evaluations

• Annual evaluation since 1995• Common paradigm for comparing technologies

Page 33: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

33

Speaker Verification (text independent)

• The ELISA consortiumENST, LIA, IRISA, ...http://www.lia.univ-avignon.fr/equipes/RAL/elisa/index_en.html

• BECARS : Balamand-ENST CEDRE Automatic Recognition of Speakers

• NIST evaluationshttp://www.nist.gov/speech/tests/spk/index.htm

Page 34: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

34

NIST evaluations : Results

ENST 2003

Page 35: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

35

Evaluations: NIST 2004

Page 36: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

36

Combining Speech Recognition and Speaker Verification.

• Speaker independent phone HMMs• Selection of segments or segment classes which are

speaker specific• Preliminary evaluations are performed on the NIST

extended data set (one hour of training data per speaker)

Page 37: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

37

ALISP : Automatic Language Independent Speech ProcessingData-driven speech segmentation

Page 38: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

38

Searching in client and world speech dictionaries for speaker verification purposes

Page 39: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

39

Fusion

Page 40: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

40

Fusion results

Page 41: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

41

Voice Transformations and Forgery (occasional, dedicated)

• Isolated individuals with few resources or “professional impostors” with a dedicated budget can menace the security of speaker recognition systems

• Voice transformation technologies (e.g. segmental synthesis using an inventory of client speech data) are nowadays available

• Speaker recognition research should explicitly address this forgery issue and define appropriate countermeasures

Prevention by predicting many different forgery scenarios

Page 42: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

42

Voice Forgery using ALISP

The same words or not

Impostor

The same words or not

client

transformation

A modification of a source speaker‘s speech to imitate a target speaker

Page 43: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

43

Conversion system: ALISP encoder

Speech

MFCC analysis

HNM

HMM recognition

Harmonic envelope

Symbol index

- Representative index- DTW path

Choice of the best representative

unit

Prosody (energy+pitch)

MFCC + delta

Database of HNM Representatives

HMM models

Noise envelope

Page 44: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

44

Conversion system: ALISP Decoder

Concatenation of HNM

parameters for each

representative

HNM Synthesis

Speech signalSymbol index

Pitch, energy, timing

Representative index

DTW path

Page 45: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

45

Preliminary results: DET curves

• Fabefore forgery: 16 ± 2.0 % (1700 files)

• Faafter forgery: 26 ± 2.0 % (1700 files)

Page 46: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

46

Preliminary results

True distributions

Page 47: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

47

Multimodal Identity Verification

• M2VTS (face and speech)front view and profilepseudo-3D with coherent light

• BIOMET:(face, speech, fingerprint, signature, hand shape)

data collectionreuse of the M2VTS and DAVID data basesexperiments on the fusion of modalities

Page 48: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

48

Speaking Faces : Motivations

• In many situation a video sequence is acquired• Fusion of face and speech increases robustness• Forgery is more difficult

Page 49: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

49

Talking Face Recognition(hybrid verification)

Page 50: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

50

Lip features

• Tracking lip movements

Page 51: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

51

A talking face model

• Using Hidden Markov Models (HMMs)

Acoustic parameters

Visual parameters

Page 52: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

52

Imposture Model

Page 53: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

53

Cloning

Page 54: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

54

Conclusions, Perspectives

• Deliberate imposture is a challenge for speech only systems

• Verification of identity based on features extracted from talking faces should be developped

• Common databases and evaluation protocols are necessary

• Free access to reference systems will facilitate future developments

Page 55: Cours parole du  9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet

55

BioSecure Residential Workshop

• Aug. 1st - 26th, 2005 in ENST, Paris• Reference systems for speech, face, talking face,

fingerprint, iris, hand, signature, …• Comparative evaluations on large databases (BIOMET,

BANCA, FVC,…)• Fusion of modalities

http://www.biosecure.info