tics Techniques COD EK Nov07

download tics Techniques COD EK Nov07

of 18

Transcript of tics Techniques COD EK Nov07

  • 8/3/2019 tics Techniques COD EK Nov07

    1/18

    1/18

    Bioinformatics tools and techniquesInto the heart of darkness

    Elaine Kenny

    Colm ODushlaine

    15/11/07

  • 8/3/2019 tics Techniques COD EK Nov07

    2/18

    2/18

    Summary

    Simple overviews of some of the tools and methods used by EK andCOD

    TK notebook

    get_hapmap_snps.pl: retrieve HM genotype information for a list ofSNP

    s GeneViewer.pl & cross_ref.pl: visualise e.g. SNPs in the context of

    other genomic landmarks. Score SNPs depending on how many ofthese landmarks they overlap with

    ld_expander.pl: find SNPs in LD with SNPs of interest, based onuser-specified r2 and LD window (distance between SNPs)

    STATA VIM: command line text editor

    Lab website

  • 8/3/2019 tics Techniques COD EK Nov07

    3/18

    3/18

    TK notebook

    Application for saving notes, to-do lists, daily

    logs, and any other kind of textual information

    in a place where you can find it all again, and

    where related information is easily found

    Easy to edit and rapidly searchable

    DEMO editing

    DEMO search

  • 8/3/2019 tics Techniques COD EK Nov07

    4/18

    4/18

    get_hapmap_snps.pl

    Simple script to read in a 1-column list of

    SNPs and retrieve HapMap genotypes

    Can select population and strand DEMO

    Retrieved data can be loaded into HaploView

    DEMO

  • 8/3/2019 tics Techniques COD EK Nov07

    5/18

    5/18

    cross_ref_scored.pl

    Score SNPs based on how many putatively functional regionsthey overlap with:

    On a per gene / chromosome basis

    Gene basis:

    Type: perl cross_ref_scored.pl file_A file_B file_C ...where

    file_A - 2-column file ofSNPs (format = id, location)

    file_B - 3-column file of EXONS (format = id/name, start, stop)

    file_C ... - whatever you want, (format = id/name, start, stop)

    i.e. other regions like CpGs, TFBS, clusters. Any order.

  • 8/3/2019 tics Techniques COD EK Nov07

    6/18

    6/18

    cross_ref_scored.pl example output:

    Can then be merged with HapMap / Perlegen to retrieve MAF data

    forSNPs

  • 8/3/2019 tics Techniques COD EK Nov07

    7/18

    7/18

    Merge cross_ref_scored data with HapMap/Perlegen data using merge_per_hap.pl

    Type:

    perl merge_per_hap.pl perlegen.txt hapmap.txt overlapped_region_scored.txt

    Where:

    hapmap.txt = 3-column file (format: rsid, ref_allele, ref_allele_freq),

    perlegen.txt = 3-column file (format: rsid, ref_allele, ref_allele_freq)

  • 8/3/2019 tics Techniques COD EK Nov07

    8/18

    8/18

    cross_ref.pl applied to WGA data

    cross_ref.pl: Scoring SNPs throughout genome Data analysed on coding/non-coding basis

    (coding)

    perl cross_ref.pl Overlapped_regions_scored.WTCCC.chr22.coding.txt 22

    WTCCC_T2D_chr22_without_inferred.forCrossRef

    WGA_databases/coding_non_synon_SNPs_UCSC.clean=3

    WGA_databases/coding_synon_SNPs_UCSC.clean=2

    WGA_databases/RefSeq_Genes_UCSC.byExon.uniqid=1 WGA_databases/Triplexes_may2006.bed=2

    WGA_databases/splice_site_SNPs_UCSC.clean=2 >

    Overlapped_regions_scored.WTCCC.chr22.coding.log &

    (input-dependent, coding/non-coding dependent, arbitrary)

    (noncoding)

    perl cross_ref.pl Overlapped_regions_scored.WTCCC.chr22.NONcoding.txt 22

    WTCCC_T2D_chr22_without_inferred.forCrossRef WGA_databases/TFBS.chr22=1WGA_databases/CpG_islands_UCSC.uniqid=1

    WGA_databases/Most_conserved_phastConsElements17way_UCSC.clean=1

    WGA_databases/promoters_knowngene_hg18.txt=1 WGA_databases/sno_or_miRNA_UCSC.uniqid=1 >

    Overlapped_regions_scored.WTCCC.chr22.NONcoding.log &

  • 8/3/2019 tics Techniques COD EK Nov07

    9/18

    9/18

    cross_ref.pl

    cross_ref.pl output:

    Load into STATA. IfSNPs have e.g.

    association p-values, calculate adjusted p-value (R. Anney) as-log10[P] + [cross_ref_score]

  • 8/3/2019 tics Techniques COD EK Nov07

    10/18

    10/18

    GeneViewer.pl

    GeneViewer.pl: Visualise overlappingfeatures (e.g. exons, SNPs etc.) along e.g.

    your gene of interest (html output)

  • 8/3/2019 tics Techniques COD EK Nov07

    11/18

    11/18

    ld_expander.pl

    Find proxies (SNPs in LD) for a list ofSNPs

    User specifies the r2 and LD window

    Currently configured to obtain proxies from HM CEU

    Result is a list of additional proxy SNPs that have

    been obtained by LD expansion

    DEMO

    Note: dont LD expand >150000

    SNPs, or HapMapwill ban you! COD has an alternative version that

    uses local pre-computed pairwise LD SNP files

  • 8/3/2019 tics Techniques COD EK Nov07

    12/18

    12/18

    STATA

    Extremely powerful and flexible

    >65k rows handled shock horror!

    Can write scripts to automate tasks, e.g. read in file,

    do analysis, save results

    When use GUI to run some commands, the

    commands are shown in the command window, so

    can save in a do file

    COD, EK and R. Anney strongly advocate this as a

    platform for both file manipulation and statistical

    analysis

  • 8/3/2019 tics Techniques COD EK Nov07

    13/18

    13/18

    http://www.wtccc.org.uk/

    STATA example using WTCCC data

    Bipolar Disorder,

    Coronary Artery Disease,

    Crohn's Disease,

    Hypertension,

    Rheumatoid Arthritis,

    Type 1 Diabetes,

    Type 2 Diabetes

  • 8/3/2019 tics Techniques COD EK Nov07

    14/18

    14/18

    DATA FORMAT

    3 folders: Basic

    Each case collection against the pooled control groups

    58C and UKBS

    Combined cases Combining other case collections as controls

    Combined controls

    Combining phenotypically relevant case collections

    (e.g. RA/T1D, autoimmune )

    Data are split by chromosome

  • 8/3/2019 tics Techniques COD EK Nov07

    15/18

    15/18

    Questions

    How do I get all of the chromosome data formy gene of interest into one file?

    How do I search easily all of the SNP

    information for my gene(s) of interest? Create a .do file for all manipulations that you

    want to carry out to the data

    DEMO

    Good starting resource:http://www.ats.ucla.edu/stat/stata/

  • 8/3/2019 tics Techniques COD EK Nov07

    16/18

    16/18

    VIM

    Vi Improved. Mainly UNIX but cross-platform text editor (available for Windows).

    Full list of commands outside scope of thisdemonstration

    Very fast and efficient, esp. with search andreplace functions on large datasets

    Regular expression pattern matching

    DEMO Integrates with Cygwin (www.cygwin.com

    very useful UNIX emulator for windows)

  • 8/3/2019 tics Techniques COD EK Nov07

    17/18

    17/18

    Group website

    Some useful stuff up there!

    Please send information about current

    projects etc. Good for our image as a group

    and minimal effort required on your part

    DEMO

  • 8/3/2019 tics Techniques COD EK Nov07

    18/18

    18/18

    Conclusions

    Small summary of some things you can do

    Slides and video demonstrations will be online at:http://www.medicine.tcd.ie/psychiatry/research/neurop

    sychiatry/Protocols/

    COD & EK available for advice (Fridays 9-9.02am)

    These things will help you in your work!!