tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM...

23
Les matrices de substitution des acides nucléiques Alphabet à 4 lettres : A,C,G,T Matrice identité => pas d'amélioration du modèle, non adapté à l'évolution des séq nucléiques Matrice transition/transvertion => pénalise davantage les transversions (purines <=> pyrimidines) que les transitions (pur <=> pur, pyr <=> pyr) A C G T A 1 0 0 0 C 0 1 0 0 G 0 0 1 0 T 0 0 0 1 A C G T A 3 0 1 0 C 0 3 0 1 G 1 0 3 0 T 0 1 0 3

Transcript of tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM...

Page 1: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

Les matrices de substitution des acides

nucléiques

■A

lphabet à 4 lettres : A,C,G,T

■M

atrice identité => pas d'am

élioration du modèle,

non adapté à l'évolution des séq nucléiques

■M

atrice transition/transvertion => pénalise davantage les

transversions (purines <=> pyrimidines)

que les

transitions (pur <=> pur, pyr <=> pyr)

AC

GT

A1

00

0

C0

10

0

G0

01

0

T0

00

1

AC

GT

A3

01

0

C0

30

1

G1

03

0

T0

10

3

Page 2: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

Les matrices de substitution des acides

aminés

■A

u cours de l’évolution: ❑

Des acides am

inés sont remplacés

« préférentiellement » par d’autres

❑Ils possèdent par exem

ple des propriétés physico-chim

iques proches ❑

Des acides am

inés sont plus conservés que d’autres

❑Ils sont par exem

ple essentiels dans la structure 3D des

protéines (comm

e Tryptophane/W/Trp)

Page 3: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

Les matrices de substitution des acides

aminés

■M

atrices basées sur les propriétés physico-chim

ique des aa m

atrices d'hydrophobicité m

atrices des structures secondaires m

atrices basées sur comparaisons de protéines partagent la m

ême

structure 3-D ■

Matrices basées sur les substitutions entre

aa au cours de l'évolution Les « log odds » : Sij = log [ qij / (pi.pj) ]

qij = probabilité de la substitution i vers j

pi = probabilité normalisée d’apparition du résidu i

pj = probabilité normalisée d’apparition du résidu j

Page 4: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

AB

BC

AC

AC

BB

AC

AB

AB

AC

CA

BC

BC

A={A,B,C

}M

atrix Blosum exem

ple

Fréquence de chaque pair de aa pour chaque colonne k

Ou ni est le nom

bre d’observations du residue i dans la colonne k

fii =

1.

Count p

air frequen

cies for each

pair o

f amin

o acid

s i and j, fo

r each co

lum

n k o

f each

blo

ck:

E.g

., 1st co

lum

n is A

AC

AB

A

AA

44

4(4

-1)/2

= 6

AB

41

(4)(1

) = 4

AC

41

(4)(1

) = 4

BB

11

(1)(1

-1)/2

= 0

BC

11

(1)(1

) = 1

CC

11

(1)(1

-1)/2

= 0

i.e., for “lik

e” com

pariso

ns,

for “u

nlik

e” com

pariso

ns,

where n

i = th

e num

ber o

f times resid

ue i w

as observ

ed in

the co

lum

n

Just as w

ith th

e PA

M m

atrix, w

e will co

mpute th

e BL

OS

UM

score as th

e (log) ratio

of th

e

observ

ed p

robab

ility o

f substitu

tion o

f one am

ino acid

by an

oth

er div

ided

by th

e

pro

bab

ility ex

pected

purely

due to

chan

ce. First th

e num

erator:

Calc

ula

ting a

BLO

SU

M M

atrix

cii (k)=ni

2

cij (k)

cij (k)=ni n

j

cij (k)

k

fij =

1.

Count p

air frequen

cies for each

pair o

f amin

o acid

s i and j, fo

r each co

lum

n k o

f each

blo

ck:

E.g

., 1st co

lum

n is A

AC

AB

A

AA

44

4(4

-1)/2

= 6

AB

41

(4)(1

) = 4

AC

41

(4)(1

) = 4

BB

11

(1)(1

-1)/2

= 0

BC

11

(1)(1

) = 1

CC

11

(1)(1

-1)/2

= 0

i.e., for “lik

e” com

pariso

ns,

for “u

nlik

e” com

pariso

ns,

where n

i = th

e num

ber o

f times resid

ue i w

as observ

ed in

the co

lum

n

Just as w

ith th

e PA

M m

atrix, w

e will co

mpute th

e BL

OS

UM

score as th

e (log) ratio

of th

e

observ

ed p

robab

ility o

f substitu

tion o

f one am

ino acid

by an

oth

er div

ided

by th

e

pro

bab

ility ex

pected

purely

due to

chan

ce. First th

e num

erator:

Calc

ula

ting a

BLO

SU

M M

atrix

cii (k)=ni

2

cij (k)

cij (k)=ni n

j

cij (k)

k

Page 5: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

AB

BC

AC

AC

BB

AC

AB

AB

AC

CA

BC

BC

fii =

1.

Count p

air frequen

cies for each

pair o

f amin

o acid

s i and j, fo

r each co

lum

n k o

f each

blo

ck:

E.g

., 1st co

lum

n is A

AC

AB

A

AA

44

4(4

-1)/2

= 6

AB

41

(4)(1

) = 4

AC

41

(4)(1

) = 4

BB

11

(1)(1

-1)/2

= 0

BC

11

(1)(1

) = 1

CC

11

(1)(1

-1)/2

= 0

i.e., for “lik

e” com

pariso

ns,

for “u

nlik

e” com

pariso

ns,

where n

i = th

e num

ber o

f times resid

ue i w

as observ

ed in

the co

lum

n

Just as w

ith th

e PA

M m

atrix, w

e will co

mpute th

e BL

OS

UM

score as th

e (log) ratio

of th

e

observ

ed p

robab

ility o

f substitu

tion o

f one am

ino acid

by an

oth

er div

ided

by th

e

pro

bab

ility ex

pected

purely

due to

chan

ce. First th

e num

erator:

Calc

ula

ting a

BLO

SU

M M

atrix

cii (k)=ni

2

cij (k)

cij (k)=ni n

j

cij (k)

k

fij =

1.

Count p

air frequen

cies for each

pair o

f amin

o acid

s i and j, fo

r each co

lum

n k o

f each

blo

ck:

E.g

., 1st co

lum

n is A

AC

AB

A

AA

44

4(4

-1)/2

= 6

AB

41

(4)(1

) = 4

AC

41

(4)(1

) = 4

BB

11

(1)(1

-1)/2

= 0

BC

11

(1)(1

) = 1

CC

11

(1)(1

-1)/2

= 0

i.e., for “lik

e” com

pariso

ns,

for “u

nlik

e” com

pariso

ns,

where n

i = th

e num

ber o

f times resid

ue i w

as observ

ed in

the co

lum

n

Just as w

ith th

e PA

M m

atrix, w

e will co

mpute th

e BL

OS

UM

score as th

e (log) ratio

of th

e

observ

ed p

robab

ility o

f substitu

tion o

f one am

ino acid

by an

oth

er div

ided

by th

e

pro

bab

ility ex

pected

purely

due to

chan

ce. First th

e num

erator:

Calc

ula

ting a

BLO

SU

M M

atrix

cii (k)=ni

2

cij (k)

cij (k)=ni n

j

cij (k)

k

Premiere colonne k=1

AA

3

AB

0

AC

3

BB0

BC0

CC0

Fréquence de chaque pair de aa pour chaque colonne k

Page 6: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

AB

BC

AC

AC

BB

AC

AB

AB

AC

CA

BC

BC

A={A,B,C

}M

atrix Blosum exem

ple

fii =

1.

Count p

air frequen

cies for each

pair o

f amin

o acid

s i and j, fo

r each co

lum

n k o

f each

blo

ck:

E.g

., 1st co

lum

n is A

AC

AB

A

AA

44

4(4

-1)/2

= 6

AB

41

(4)(1

) = 4

AC

41

(4)(1

) = 4

BB

11

(1)(1

-1)/2

= 0

BC

11

(1)(1

) = 1

CC

11

(1)(1

-1)/2

= 0

i.e., for “lik

e” com

pariso

ns,

for “u

nlik

e” com

pariso

ns,

where n

i = th

e num

ber o

f times resid

ue i w

as observ

ed in

the co

lum

n

Just as w

ith th

e PA

M m

atrix, w

e will co

mpute th

e BL

OS

UM

score as th

e (log) ratio

of th

e

observ

ed p

robab

ility o

f substitu

tion o

f one am

ino acid

by an

oth

er div

ided

by th

e

pro

bab

ility ex

pected

purely

due to

chan

ce. First th

e num

erator:

Calc

ula

ting a

BLO

SU

M M

atrix

cii (k)=ni

2

cij (k)

cij (k)=ni n

j

cij (k)

k

fij =

1.

Count p

air frequen

cies for each

pair o

f amin

o acid

s i and j, fo

r each co

lum

n k o

f each

blo

ck:

E.g

., 1st co

lum

n is A

AC

AB

A

AA

44

4(4

-1)/2

= 6

AB

41

(4)(1

) = 4

AC

41

(4)(1

) = 4

BB

11

(1)(1

-1)/2

= 0

BC

11

(1)(1

) = 1

CC

11

(1)(1

-1)/2

= 0

i.e., for “lik

e” com

pariso

ns,

for “u

nlik

e” com

pariso

ns,

where n

i = th

e num

ber o

f times resid

ue i w

as observ

ed in

the co

lum

n

Just as w

ith th

e PA

M m

atrix, w

e will co

mpute th

e BL

OS

UM

score as th

e (log) ratio

of th

e

observ

ed p

robab

ility o

f substitu

tion o

f one am

ino acid

by an

oth

er div

ided

by th

e

pro

bab

ility ex

pected

purely

due to

chan

ce. First th

e num

erator:

Calc

ula

ting a

BLO

SU

M M

atrix

cii (k)=ni

2

cij (k)

cij (k)=ni n

j

cij (k)

k

k=1A

A3

00

03

0

AB

02

30

30

AC

31

00

00

BB0

13

10

0

BC0

20

40

0

CC0

00

10

6

k=2k=3

k=4k=5

k=6

Page 7: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

AB

BC

AC

AC

BB

AC

AB

AB

AC

CA

BC

BC

Somm

e des fréquences de chaque pair pour toutes les colonnes

fii =

1.

Count p

air frequen

cies for each

pair o

f amin

o acid

s i and j, fo

r each co

lum

n k o

f each

blo

ck:

E.g

., 1st co

lum

n is A

AC

AB

A

AA

44

4(4

-1)/2

= 6

AB

41

(4)(1

) = 4

AC

41

(4)(1

) = 4

BB

11

(1)(1

-1)/2

= 0

BC

11

(1)(1

) = 1

CC

11

(1)(1

-1)/2

= 0

i.e., for “lik

e” com

pariso

ns,

for “u

nlik

e” com

pariso

ns,

where n

i = th

e num

ber o

f times resid

ue i w

as observ

ed in

the co

lum

n

Just as w

ith th

e PA

M m

atrix, w

e will co

mpute th

e BL

OS

UM

score as th

e (log) ratio

of th

e

observ

ed p

robab

ility o

f substitu

tion o

f one am

ino acid

by an

oth

er div

ided

by th

e

pro

bab

ility ex

pected

purely

due to

chan

ce. First th

e num

erator:

Calc

ula

ting a

BLO

SU

M M

atrix

cii (k)=ni

2

cij (k)

cij (k)=ni n

j

cij (k)

k

fij =

1.

Count p

air frequen

cies for each

pair o

f amin

o acid

s i and j, fo

r each co

lum

n k o

f each

blo

ck:

E.g

., 1st co

lum

n is A

AC

AB

A

AA

44

4(4

-1)/2

= 6

AB

41

(4)(1

) = 4

AC

41

(4)(1

) = 4

BB

11

(1)(1

-1)/2

= 0

BC

11

(1)(1

) = 1

CC

11

(1)(1

-1)/2

= 0

i.e., for “lik

e” com

pariso

ns,

for “u

nlik

e” com

pariso

ns,

where n

i = th

e num

ber o

f times resid

ue i w

as observ

ed in

the co

lum

n

Just as w

ith th

e PA

M m

atrix, w

e will co

mpute th

e BL

OS

UM

score as th

e (log) ratio

of th

e

observ

ed p

robab

ility o

f substitu

tion o

f one am

ino acid

by an

oth

er div

ided

by th

e

pro

bab

ility ex

pected

purely

due to

chan

ce. First th

e num

erator:

Calc

ula

ting a

BLO

SU

M M

atrix

cii (k)=ni

2

cij (k)

cij (k)=ni n

j

cij (k)

k

k=1A

A3

00

03

06

AB

02

30

30

8

AC

31

00

00

4

BB0

13

10

05

BC0

20

40

06

CC0

00

10

67

k=2k=3

k=4k=5

k=6fij

AABCD---BBCDA

DABCD-A-BBCBB

BBBCDBA-BCCAA

AAACDC-DCBCDB

CCBADB-DBBDCC

AAACA---BBCCC

2. S

um

the sco

res for each

colu

mns acro

ss colu

mns:

Calc

ula

ting a

BLO

SU

M M

atrix

(contin

ued)

cij

=cij (k)

k

3. N

orm

alize the p

air frequen

cies so th

ey w

ill sum

to 1

:

T=

cij

i≥j

= w

n(n−

1)

2

qij

=cij

T

where

w =

num

ber o

f colu

mns

n =

nu

mb

er of seq

uen

ces

For p

revio

us ex

ample, q

AB calcu

lation acro

ss colu

mns is:

qAB

=4

+8

+0

+0

+0

+0

+0

7(6)(5)

2

=12

105

fij kfij =

Page 8: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

Norm

alise pour avoir la somm

e 1

k=1A

A3

00

03

06

AB

02

30

30

8

AC

31

00

00

4

BB0

13

10

05

BC0

20

40

06

CC0

00

10

67

k=2k=3

k=4k=5

k=6fij

AABCD---BBCDA

DABCD-A-BBCBB

BBBCDBA-BCCAA

AAACDC-DCBCDB

CCBADB-DBBDCC

AAACA---BBCCC

2. S

um

the sco

res for each

colu

mns acro

ss colu

mns:

Calc

ula

ting a

BLO

SU

M M

atrix

(contin

ued)

cij

=cij (k)

k

3. N

orm

alize the p

air frequen

cies so th

ey w

ill sum

to 1

:

T=

cij

i≥j

= w

n(n−

1)

2

qij

=cij

T

where

w =

num

ber o

f colu

mns

n =

nu

mb

er of seq

uen

ces

For p

revio

us ex

ample, q

AB calcu

lation acro

ss colu

mns is:

qAB

=4

+8

+0

+0

+0

+0

+0

7(6)(5)

2

=12

105

T=T=

6*4*(4-1)

2

= 36

Page 9: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

Norm

alise pour avoir la somm

e 1

k=1A

A3

00

03

06

0.16

AB

02

30

30

80.22

AC

31

00

00

40.11

BB0

13

10

05

0.14

BC0

20

40

06

0.16

CC0

00

10

67

0.19

k=2k=3

k=4k=5

k=6fij

T=36

AABCD---BBCDA

DABCD-A-BBCBB

BBBCDBA-BCCAA

AAACDC-DCBCDB

CCBADB-DBBDCC

AAACA---BBCCC

2. S

um

the sco

res for each

colu

mns acro

ss colu

mns:

Calc

ula

ting a

BLO

SU

M M

atrix

(contin

ued)

cij

=cij (k)

k

3. N

orm

alize the p

air frequen

cies so th

ey w

ill sum

to 1

:

T=

cij

i≥j

= w

n(n−

1)

2

qij

=cij

T

where

w =

num

ber o

f colu

mns

n =

nu

mb

er of seq

uen

ces

For p

revio

us ex

ample, q

AB calcu

lation acro

ss colu

mns is:

qAB

=4

+8

+0

+0

+0

+0

+0

7(6)(5)

2

=12

105

f

qij

Page 10: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

Calcul du dénominateur pi

k=1A

A3

00

03

06

0.16

AB

02

30

30

80.22

AC

31

00

00

40.11

BB0

13

10

05

0.14

BC0

20

40

06

0.16

CC0

00

10

67

0.19

k=2k=3

k=4k=5

k=6fij

qij

Sij = log [ qij / (pi.pj) ] N

ow

, we w

ill calculate th

e den

om

inato

r of th

e odds ratio

.

4. C

alculate th

e expected

pro

bab

ility o

f occu

rrence o

f the ith

residue in

an (i,j) p

air:

Calc

ula

ting a

BLO

SU

M M

atrix

(contin

ued)

pi=qii +

qij

2j≠i

5. T

he d

esired d

enom

inato

r is the ex

pected

frequen

cy fo

r each p

air (assum

ing

indep

enden

ce):

eii

=pi 2

eij

=2pi p

j (i≠j)

6. E

ach en

try fo

r (i,j) in th

e log o

dds m

atrix is th

en eq

ual to

qij /e

ij

7. L

og o

dds ratio

:

sij

=log2

qij

eij

8. V

alue sto

red fo

r BL

OS

UM

= 2

sij, ro

unded

to n

earest integ

er (“half b

it” units) pA

= 0.16 + (0.22 + 0.11)/2 = 0.325

pA = qA

A+ (qA

B + qAC)/2

pB = 0.33

pC = 0.325

Page 11: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

Calcul du dénominateur pi

AA

0.16

AB

0.22

AC

0.11

BB0.14

BC0.16

CC0.19

qij

Sij = log [ qij / (eij) ]

pA = 0.325

pB = 0.33

pC = 0.325

eij = pi 2si i=j

eij = 2pipjsi i != j

AB

C

A-2,64

B0,036

-2,83

C-0.94

-0,422-2,39

SAA

= log [ qAA

/ (eAA

) ] SA

A = log [ 0.16 / (pA

) 2 ]

SAB= log [ qA

B / 2(pA*pB) ]

Page 12: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

Les matrices de substitution des acides

aminés

Pénalités des Substitutions : Sij > 0 <=> rem

placement considéré fréquent

Sij < 0 <=> remplacem

ent rare, peu probable entre protéines homologues

Page 13: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

Les matrices de substitution des acides

aminés

■M

atrices de Substitution construites à partir de l'observation des fréquences de substitution entre séquences « apparentées »

■M

atrices PAM

= Point Accepted M

utation (D

ayhoff 1979)

■M

atrices BLOSU

M = BLO

cks SUbstitution M

atrix (H

enikoff & Henikoff 1992)

Page 14: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

Les matrices BLO

SUM

■A

partir de Blocs = alignement m

ultiple local sans insertion-délétion pour une fam

ille de protéines

■Calcul des scores Sij = log [ qij / (pi.pj) ]

~2000 blocs, 500 familles de protéines

Page 15: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

Les matrices BLO

SUM

■Regroupem

ent des séquences au sein de leur bloc – Regroupem

ent en fonction d’un seuil d’identité • Seuil = 80%

<=> BLOSU

M80

• Seuil = 60% <=> BLO

SUM

60

■Calcul des scores par cluster => dim

inue la redondance liée au nom

bre de paires identiques (sur-représentation de certaines séquences par exem

ple)

Page 16: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

BLOSU

M 62

Page 17: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

RDISLV---KNAGI | | || || || RNI-LVSDAKNVGI

Score = 5+1+4-5+4+4-5-5-5+5+6+0+6+4 = 19

■A

lignement des deux séquences protéiques

RDISLVKNAGI et RNILVSDAKNVGI avec « BLO

SUM

62 »

Correspondance et Substitution: cf. BLOSU

M, Indel: -5

Page 18: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

Les matrices PA

MBasées sur alignem

ent multiple global de séquences

très similaires (>85%

identité), mutations dites acceptées car ne

changent pas significativement la fonction de la protéine.

1) alignement de séquences (71 fam

illes de protéines (1300 séquences))

2) Comptage des substitutions, com

paraison paire par paire Aij

3) Calcul mutabilité : m

i= ΣjA

ij /fi (pour chaque aa i, fi fréquence d'apparition)

4) Calcul des scores Rij=Mij/fi avec M

ij=miA

ij/ΣiA

ij et Norm

alisation tq Σ

Rij=1

=> matrices de m

utation MD

M (M

utation Data M

atrix)

5) Extrapolation pour séquences plus éloignées ND

M-n = (N

DM

-1)^n

(1-PAM

= 1 mutation acceptée pour 100 résidus)

6) Transformation en m

atrice « log odds » : PAM

-n = log(ND

M-n)

Page 19: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

Matrice PA

M■

Extrapolation « 1->n » basée sur hypothèse forte que le « taux de m

utation » est constant et équiprobable sur toute la longueur des séquences

■Biais d ‘échantillonnage :

1978 : ensemble des séquences pas représentatif

(1300 séquences, 71 familles)

1992 : réactualisation :

16 130 séquences, 2 621 familles

Poin

t Accepted

Mu

tation

Page 20: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

PAM

10

Page 21: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

Matrices PA

M

■Choix de la m

atrice N en fonction de l’évolution

supposée des séquences

❑Plus « N

 » est élevé, plus la matrice est adaptée à la

comparaison de séquences divergentes

❑Si la divergence n’est pas connue (ce qui est généralem

ent le cas), faire plusieurs essais

Rque : N>100 si un résidu est m

uté plusieurs fois

Page 22: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

Matrice BLO

SUM

■Choix de la m

atrice N en fonction du pourcentage

d’identité supposé des séquences

❑Plus « N

 » est élevé, plus la matrice est adaptée à la

comparaison de séquences de forte identité

❑Si l’identité n’est pas connue (ce qui est généralem

ent le cas), faire plusieurs essais

Page 23: tage les té s ,C,G,T - lcqb.upmc.fr · Just as with the PAM matrix, we will compute the BLOSUM score as the (log) ratio of the observed probability of substitution of one amino acid

Quelle m

atrice de score utiliser ?

■« Faible divergence/Forte identité » : ❑

PAM

40 ou BLOSU

M 80

■« M

oyenne divergence/Moyenne identité »:

❑PA

M 120 ou BLO

SUM

62 ■

« Forte divergence/Faible identité » : ❑

PAM

250 ou BLOSU

M 45

Il n’y a pas de matrice parfaite !