Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies...

22
Genomics Technologies Suitable for Population Studies Michael Snyder December 18, 2007 GGTTCCAAAAGTTTATTGGATGCCGTTTCAGTACATTTATCGTTTGCTTTGGATGCCCTAATTAAAAGTGA CCCTTTCAAACTGAAATTCATGATACACCAATGGATATCCTTAGTCGATAAAATTTGCGAGTACTTTCAAA GCCAAATGAAATTATCTATGGTAGACAAAACATTGACCAATTTCATATCGATCCTCCTGAATTTATTGGCG TTAGACACAGTTGGTATATTTCAAGTGACAAGGACAATTACTTGGACCGTAATAGATTTTTTGAGGCTCAG CAAAAAAGAAAATGGAAATACGAGATTAATAATGTCATTAATAAATCAATTAATTTTGAAGTGCCATTGTT TTAGTGTTATTGATACGCTAATGCTTATAAAAGAAGCATGGAGTTACAACCTGACAATTGGCTGTACTTCC AATGAGCTAGTACAAGACCAATTATCACTGTTTGATGTTATGTCAAGTGAACTAATGAACCATAAACTTGG TCATTCG

Transcript of Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies...

Page 1: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

Genomics Technologies Suitable for Population Studies

Michael Snyder

December 18, 2007

GGTTCCAAAAGTTTATTGGATGCCGTTTCAGTACATTTATCGTTTGCTTTGGATGCCCTAATTAAAAGTGACCCTTTCAAACTGAAATTCATGATACACCAATGGATATCCTTAGTCGATAAAATTTGCGAGTACTTTCAAAGCCAAATGAAATTATCTATGGTAGACAAAACATTGACCAATTTCATATCGATCCTCCTGAATTTATTGGCGTTAGACACAGTTGGTATATTTCAAGTGACAAGGACAATTACTTGGACCGTAATAGATTTTTTGAGGCTCAGCAAAAAAGAAAATGGAAATACGAGATTAATAATGTCATTAATAAATCAATTAATTTTGAAGTGCCATTGTTTTAGTGTTATTGATACGCTAATGCTTATAAAAGAAGCATGGAGTTACAACCTGACAATTGGCTGTACTTCCAATGAGCTAGTACAAGACCAATTATCACTGTTTGATGTTATGTCAAGTGAACTAATGAACCATAAACTTGGTCATTCG

Page 2: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

Genomics Technologies

1)Mapping DNA sequence variationvariation

2) Mapping transcripts2) Mapping transcripts

3) Mapping regulatory information) g g y

Page 3: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

Genome Tiling Arrays

25 or 50mer

Massively Parallel Sequencing

AGTTCACCTAAGA…

CTTGAATGCCGAT…

GTCATTCCGCAAT…

Page 4: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

Human Variation:SNP Mapping

Methods:

Affymetrix 6.0900K SNPs + CNVs

Illumina 1M SNP + CNV1M SNPs + CNVs

Sequencing basedSequencing based Methods

Page 5: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

Mapping Structural Variation in Humans

- Thought to be Common12% of the genome12% of the genome (Redon et al. 2006)

- Likely involved in phenotype y p ypvariation and disease

- Most methods for detection are CNVs

low resolution (>50 kb)

Page 6: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

High Resolution Comparative Genomic Hybridization

Chromosome 22Chromosome 22

Array Based Methods:BAC arraysBAC arrays

Affy & Illumina ~50kb

Nimblegen 400 K Array

Person 1

igna

l

Nimblegen 400 K Array

1 array (>60kb)

8 array set (>10 kb?)

Person 2Si

8 array set (>10 kb?)

2.1 M feature arrays

1 array (>5 kb?)

Chromosome Position

1 array (>5 kb?)Urban et al. (2006) PNAS

Page 7: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

High Resolution-Paired-End Mapping (HR-PEM)

CircularizeShear to 3 kbAdaptor ligation Bio Bio

Genomic DNA Fragments

Bio

Random Cleavage Bio

Bio

Bio

454 M i l P ll l S i

200-300bp

454 Massively Parallel Sequencing (250bp reads, 400 000 reads/run)

Map paired ends to human reference genomeMap paired ends to human reference genome

Page 8: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

Summary of PEM Results

NA15510 (European?, female)

NA18505 (Yoruba, female)

# of sequence reads > 10 M > 21 M# of sequence reads > 10 M. > 21 M.

Paired ends uniquely mapped > 4.2 M. > 8.6 M.

Fold coverage ~ 2.1x ~ 4.3x

Predicted Structural 478 839Variants*IndelsInversion breakpoints

47842751

83975881

Estimated total variants*genome wide

759 902genome-wide

*at this resolution

Page 9: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

Size distribution of Structural Variants[NA15510 excluding complex rearrangements and inversion breakpoints]

400

450

400

[NA15510, excluding complex rearrangements, and inversion breakpoints]

300

350

400

300

400

bin

150

200

250

200

ount

for e

ach

50

100

150

100

Co

0 1 2 3 4 5 6 7 8 9 10

x 104

0

>10020 40 60 800Sizes of predicted deletions and insertions [kb]

Page 10: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

Chromatin Immunoprecipitation p50

p65

Crosslink with

p50p65

Sonicate

Formaldehyde

p50

p65IP with

Specific AbSpecific Ab

Reverse Crosslinkingg

Label DNA and

Hybridize toMassive

DNAHybridize to Microarray

DNA Sequencing

Page 11: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

Pol II: Chromosome 22 S f T kSummary of Tracks

HeLa S3 Rep1

HeLa S3 Rep2HeLa S3 Rep2

HeLa S3 Rep3

HeLa S3 Input

QuickTime™ and a decompressor

are needed to see this picture.

HEK293 Rep1

HEK293 Rep2

HEK293 Rep3QuickTime™ and a

decompressorare needed to see this picture.

HEK293 Input

K562 Rep2

K562 Rep3QuickTime™ and a

NB4 Rep1

decompressorare needed to see this picture.

K562 Input

QuickTime™ and a decompressor

are needed to see this picture.

NB4 Rep2

NB4 Rep3

Page 12: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

EpigenomicsEpigenomics

1)Mapping Chromatin Modifications

2)Mapping DNA Methylation

Page 13: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

Mapping Transcribed Regions

25 or 36mer

• Probe Affymetrix or Nimblegen Arrays with cDNA

• >50% of transcribed regions not annotated

Page 14: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

ENCODE Project: Transcribed and Regulatory Regions

Page 15: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

F t Di tiFuture Directions

1) Whole genome sequencing.

2) Whole genome mapping of transcribed regions and regulatory information inregions and regulatory information in small samples.

3) Proteomics approaches.

Page 16: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

Technologies for Large Population Studies: Possible Recommendations

1) High density SNPs1) High density SNPs2) High resolution CNVs/SVs 2 kb or larger3) Whole genome gene expression (coding and3) Whole genome gene expression (coding and

noncoding). What tissues?4) Sera proteome and metabolomics analysis4) Sera proteome and metabolomics analysis.

Other fluids?5) Whole genome TF/Methylation mapping. 5) o e ge o e / et y at o app g

Which ones? What tissues?

Page 17: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy
Page 18: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

STAT1 x 3 Reps & Array (IRF1)y ( )

Array

B1

B2B2

B3

Mapped > 40K STAT1 Binding Sites

Page 19: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

STAT1 ChIP-Seq on IFN Treated HeLa Cells using Illumina Sequencing

with the University of British ColumbiaChIP-chip (NimbleGen 2.1 M feature arrays)ChIP-seq (15.1 M mapped reads for the IFNγ)

with the University of British Columbia

ChIP-ChIP: IFNγ/ unstim

ChIP-seq: IFNγ

ChIP-seq: unstim

Robertson et al. 2007

Page 20: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

ENCODE Pilot PhaseENCODE Pilot Phase(Analyze 1% of the Human Genome)

Mapping transcribed regions 13 E i t13 Experiments

Mapping regulatory information >150 ChIP Chip experiments>150 ChIP Chip experimentsDNAase hypersensitive sites

Page 21: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

Massively Parallel SequencingMassively Parallel Sequencing Technologies

454400K d400K reads 250 b

Illumina and ABI>40M reads 36 b

Page 22: Genomics Technologies Suitable for Population Studies · 2008-02-21 · Genomics Technologies 1)Mapping DNA sequence variation 2) Mapping transcripts2) Mapping transcripts 3))ggy

Distribution of SVs

**

*

VCFS

*