Traitement de la Parole à la FPMs (1983-2000)

Traitement de la ParoleTraitement de la Paroleà la FPMs (1983-2000)à la FPMs (1983-2000)

T. Dutoit

TCTS Lab

Faculté Polytechnique de Mons

Belgium

[email protected]

2

PlanPlan

• Intro : MULTITEL-TCTS• Traitement de la parole : un problème en soi• Synthèse de parole

• Pour quoi faire?• Une brève histoire de la synthèse vocale• Le projet MBROLA• Une nouvelle révolution technologique

• Reconnaissance de parole• Reconnaissance de parole? Pour quoi faire?• Une (très) brève histoire de la reconnaisance• Exemples : THISL, Démosthènes

3

25 enseignants et chercheurs, depuis 1983Contrats industriels (SAIT, L&H, ACEC, BRT)1992-1995, projet ESPRIT HIMARNNET : reconnaissance de mots isolés, indépendant du locuteur, sur lignes téléphoniques (FPMs, L&H, ASCOM, TEDAS, EPFL)1994 : Projet MBROLA en synthèse vocale1995-2000 : création du groupe MULTITEL-TCTS, sur fonds Region Wallone/EEC dans le cadre d ’Objectif1THISL, RESPITE, SPRACH, DEMOSTHENES, EULER,W1997 : Babel Technologies S.A

MULTITEL-TCTS (Théorie des MULTITEL-TCTS (Théorie des Circuits et Traitement du Signal)Circuits et Traitement du Signal)

4

So you thought So you thought speech speech processing processing was just a component was just a component

of of signal processing :)signal processing :)• Signals carry information (=unpredictable

data) from source to receivercommunication signals, images, biological signals, speech

• Complexity of signals = f(complexity of source/receiver), and vice-versa

– Speech is produced, perceived, and understood by the most complex of all machines

•Speech is perceived and understood when produced (ex: deaf-mute; lombard effect)•What is predictible by the brain is not transmitted (“Please take a seat”)

5

“These speech systems provide excellent examples for the study of complex systems, since they raise fundamental issues in system partitioning, choice of descriptive units, representational techniques, levels of abstraction, formalisms for knowledge representation, the expression of interacting constraints, techniques of modularity and hierarchy, techniques for characterizing the degree of belief in evidence, subjective techniques for the measurement of stimulus quality, naturalness and preference, the automatic determination of equivalence classes, adaptive model parameterization, tradeoffs between declarative and procedural representations, system architectures, and the exploitation of contemporary technology to produce real-time performance with acceptable cost.” (Allen, 1985)

6

Un problème en soiUn problème en soi

• Traitement du signal

• Acoustique• Phonétique

(multilingue)• Linguistique

informatique• Génie logiciel (!)

CodageCodageSynthèse Synthèse

Reconnaissance Reconnaissance Compréhension Compréhension

(dialogue,(dialogue,traduction)traduction)

8

TTS: What for ?TTS: What for ?

• Telephone-based applications– Telecommunications ($)

• Who’s calling• Integrated messaging (fax, email, answering

machine)• Automatic reverse directory• Personal telephone attendant

– Voice acces to databases (70% of calls require very little interactivity)

• Price lists• Cultural events• Weather report

9

• Multimedia– CDRoms– Talking books– Interactive games

• Man-machine communication


10


• Help to the disabled– Speech impairment

• Artificial voice

– Sight impairment• Automatic reading of

electronic documents

• Automatic reading of paper documents (with OCR)

11


• Fundamental research

12

A brief history of speech A brief history of speech synthesissynthesis

1936 : Omer Dudley (Bell Labs) invents the VODER, 1st electric synthesizer ever Noise

Source

Oscillator

Resonnance Control Amplifier

106 7 8

9

"Quiet"

t-dp-b

k-g

Energy switchwrist bar

VoderConsoleKeyboard

12 3 4

5

Pitch-controlpedal

UV

V

13


1964, Rule-based synthesis(1979, MITTalk; 1981, KLATTALK; 1983, DECTalk)

InfoVox (1983-95)

Berkeley Speech Technology (1990)

14


Diphone

Database

Prosody

Modification

_ d o g _

50ms 80ms 160ms 70ms 50ms

F0

_d do og g _

Smooth joints

0 1000 2000 3000 4000 5000 6000 7000 8000-1

-0.5

0

0.5

1x 10

4

Diphone-based synthesis

Bell Labs (90s)

CNET, 1989

LIMSI, Paris, 1989

FPMs, 1993

15

The MBROLA ProjectThe MBROLA Project

16

The MBROLA ProjectThe MBROLA Project

=

(20 langues)

> 80 persons actively involved

Patented, 1996

ITEA 96 European Award

Collaboration with

Creation of

Kluwer (97) - PPUR (2000)

DEMODEMO

17

Text

Text Analyzer

MorphologicalAnalyzer

ContextualAnalyzer

Letter-To-

module

Prosody generator

to the DSP block

Sound

The NLP module

Pre-Processor

or

M

DS

FSs

L

Syntactic-

ParserProsodic

18

TTS : Une révolution en TTS : Une révolution en marchemarche

– For automatic phonetization (L&H, ENST, Univ. Edinburgh, FPMs)

– For automatic generation of intonation and phoneme duration (AT&T, FPMs, Univ. Aix, Univ. Edinburgh)

– For automatic selection of units for concatenative synthesis (ATR, Univ. Edinburgh, AT&T, FPMs?)

1995-?: The database years

19

Diphone

Database

Prosody

Modification

_ d o g _


F0

_d do og g _

Smooth joints

0 1000 2000 3000 4000 5000 6000 7000 8000-1

-0.5

0

0.5

1x 10

4

Diphone-based synthesis

TTS : A New ChallengeTTS : A New Challenge

20

VERY LARGE

CORPUS

Prosody

Modification

_ d o g _


F0

_d do og g _

Smooth joints

0 1000 2000 3000 4000 5000 6000 7000 8000-1

-0.5

0

0.5

1x 10

4

Unit selection-based synthesis

TTS : A New ChallengeTTS : A New Challenge

21

Software Eng. ConcernsSoftware Eng. Concerns

1. Automatic phonetization2. Automatic prosody generation3. Speech synthesis

TEXT SPEECHDIGITAL SIGNALPROCESSING

Mathematical modelsAlgorithms

Computations

NATURAL LANGUAGE PROCESSING

Linguistic formalismsInference enginesLogical inferences

PhonesProsody

TEXT-TO-SPEECH SYNTHESIZERNarrowPhonetic

Transcription

22


• Signal Processing MATLAB• Speech Recognition HTK, WATSON

STRUT,…• Speech Synthesis FESTIVAL, EULER

1. Future milestones in speech processing will come from labs with strong commitment to solid, portable, and extensible code;2. Speech scientists and software engineers will soon be the same people.

23


Modular TTS : DLL-based (.so on LINUX) 1.0 (May 99) : French - MS Windows

2.0 (Oct. 2000) : Mulitlingual - Win-Linux

diphones

INIT

Perl ScriptsPERL

Preprocessor

PhonetizerProsodi

c grouping

Duration

F0

MBROLA

User modul

e

Rules

CARTs MLC

DEMODEMO

24

La famille MBROLALa famille MBROLA

25

The W ProjectThe W Project

Aid to the disabled

– speech disabilities vocal tract prothesis• INTERFACE???

– visual disabilities• Automatic reading of

electronic documents• + OCR for reading paper

documents

26

The W ProjectThe W Project

• A freely available, multilingual speaking machine for people with speech disabilities?– Freely available multilingual TTS :

EULER/MBROLA– Freely available multilingual user interface?

• Word prediction? No real keystroke reduction for real texts

• Word contractions:GRADE II Braille (abbreviations for words and groups of letters; used for more then 100 years; methods available; exists for various languages).

• From W to HOOK

DEMODEMO

27

Reconnaissance de paroleReconnaissance de parole

Extraction paramètresParole

Entraînement des modèles

Unités Lexicales Dictionnaire

Modèles de motsTextes Grammaires

Extraction paramètres

Décodage

Parole

N-Grammes

Phrase la plus

probable

28

Reconnaissance de parole: Reconnaissance de parole: Pq?Pq?

Commande et Contrôlecontrôle équipements particuliers, programmes ...

Accès à des bases de donnéesHome banking, numéros de tel., serveurs vocaux, ...

Dictée Vocalecréation de lettres, rapports et autres documents ...

Transcription AutomatiqueIndexation de programmes télévision ou radio, sous titrages …

Autres …Apprentissage des langues, jeux ...

29

ClassificationClassification

Dépendant ou indépendant du locuteurElocution

Mots isolésMots connectés ou enchaînésParole continueParole spontanéeMots clefs

Taille du vocabulaire (de quelques mots à quelques 10.000 mots)Contrainte grammaticale : N-grammes.Environnements bruités, lignes téléphoniques ...

30

Une brève histoire...Une brève histoire...Premiers systèmes basés sur recherche paramètres

invariants pour identification de phonèmes (méthodes phonéticiens) peu efficace

1970 : méthodes basées sur programmation dynamique (DTW) Efficace pour petit voc. Dépendant du locuteur.

1980 : méthodes statistiques : HMMs, Hidden Markov Models Amélioration des taux de reconnaissance Systèmes indépendants du locuteur. Grand vocabulaire.

1990 : méthodes hybrides : HMMs / MLP (réseaux de neurones) Systèmes plus robustes (au bruit), plus rapide et plus performants.

31

Project ESPRIT 23495 THISL

ProgrammesTV/Radio

Transcription automatiquevia LVCSR

Indexationvia IR

Index

Requête(ex : What about Bill Clinton ?)

orale écrite

Recherchevia NLP + IR

Liste ordonnée desprogrammes retrouvés

(passages écrits + extraits sonores)

PhaseRecherche

PhaseIndexation

Le projet THISLLe projet THISL

32

Le projet DémosthènesLe projet Démosthènes

DEMOSTHENES a pour objectif de proposer un programme multimédia pour l'apprentissage et la correction du néerlandais parlé. L'outil ainsi conçu permettra de détecter et de corriger les erreurs-types de prononciation du néerlandais chez tout locuteur francophone. Il sera intégré dans un cours reprenant les éléments essentiels de prononciation de la langue et des exercices ciblés sur les difficultés propres à chaque apprenant.LKIT (Allemand, Anglais, etc.)

33

ConclusionConclusion

Demos: http://www.babeltech.com ou http://tcts.fpms.ac.be/synthesis

Traitement de la Parole à la FPMs (1983-2000)

Documents

Transcript of Traitement de la Parole à la FPMs (1983-2000)