## GOAL The goal of the exercise is to become familiar with the IGV genome browser and to use it for visual inspection of a BAM file. This should give some insight into the whole process of read mapping and alignment refinement, as well as variant calling. ## DATA The slides and exercises are located on: https://wiki.uio.no/projects/MBV-INFX410 The data you need for this exercise have been upload to the web: http://hts-nonsecure.uio.no/course_vc_2012_nb3_imbv You will need to authenticate to download data: user: variant password: calling It may turn out that we overload the web server, in which case we can retrieve the data from another location: freebee.abel.uio.no:/usit/abel/u1/timothyh/course_vc_2012_nb3_imbv ## EXERCISE # starting IGV * Open a web browser and go to: http://www.broadinstitute.org/igv/download * Launch the 1.2 GB version * Make sure you are using the correct reference, the top left pop down should contain: "Human (1kg, b37+decoy)" * Navigate to a specific location on chromosome 5 by pasting into the text box at the top of the window: 5:55,085,643-55,090,501 * Part of a gene model should appear at the bottom of the screen # Loading the tiles * open a second tab in your browser and go to: http://hts-nonsecure.uio.no/course_vc_2012_nb3_imbv * Click your way down the directory structure to: inputData/human_g1k_v37_chr5/agilentV1/agilent37M.chr5.b37.bed * Right click on this file and choose "Copy link address". * Return to IGV >> File >> Load from URL... >> paste the URL from your clipboard >> OK >> supply username: variant and password: calling >> the download will start and after some seconds, IGV should display the tiles that were used to capture the fragments that were sequenced and mapped. * notice how the tiles overlap with the exons * use the zoomer at the top right of the screen to zoom in a bit more and see how the tiles tend to be a bit larger than the exons * use the "back" icon at the top of the screen to zoom back out to the original location, alternatively just paste 5:55,085,643-55,090,501 in the search box * you can also zoom in by double clicking on the main part of the display # Loading BAM files * In the same way as you opened the tile files, open two BAM files specified below * The raw alignment: http://hts-nonsecure.uio.no/course_vc_2012_nb3_imbv/exerResults/04_advancedPipelineWithFuncAnnot/aln.posiSrt.bam * The refined alignment: http://hts-nonsecure.uio.no/course_vc_2012_nb3_imbv/exerResults/04_advancedPipelineWithFuncAnnot/aln.posiSrt.withRG.clean.dedup.recal.bam * Notice how the reads stack up to provide coverage for the targeted regions with coverage tailing off on the edges of tiles. * Notice to the right of the screen a fragment which was sequenced despite not overlapping with a tile * Zoom in on the exon that overlaps with tile 1746 by clicking and dragging the region in the ruler at the top of the screen. This should be on the left side of the display if you are at location: 5:55,085,643-55,090,501 * When you are sufficiently zoomed in you will start to see some individual nucleotides in the reads as colours. These are nucleotides in the reads that differ from the reference. * You will notice a column of differing nucleotides a one position in the exon, this is probably a SNP in the sample. * Hover with the pointer over the over the name part of a BAM track and right click >> Select "View as pairs". Notice how fragment size is not constant: some fragments are not fully covered by reads, some pairs of reads read into each other, and some pairs of reads fully overlap. # Opening "off the shelf tracks" * File >> Load from server >> Expand "Annotations" >> Expand "Variation and repeats" >> tick "dbSNP" >> OK * You should see a dbSNP track open. Note that you might need to scroll down to see it. * You can make the gene track less high by hovering over the gene track name >> right click >> select "Collapsed" * As already noted, you should also be able to see that there is a SNP in the exon that you are looking at: it shows up as a vertical coloured bar in the coverage track. * if you zoom in on this area you will see that it is a SNP that is already reported in dbSNP. # Improvement in alignment through refinement * Paste into the search box: 5:74,021,826-74,021,866 * If you find that you cannot see all the reads on screen, hover with the pointer over the name part of the track >> right click >> select "Squished" * Notice how the refined BAM file contains a slightly improved alignment: a read that contained many mismatches in the raw alignment, correctly contains a gap in the refined BAM file and no mismatches. * notice that this deletion is recorded in dbSNP (see dbSNP track we loaded earlier) # Sessions * Here we want to save the state of the browser so that we can return to it at a later stage or share it with others. * File >> Save session... >> Give it a name and choose a location >> OK * When you have successfully completed the above, close down IGV * Restart IGV as you did at the beginning of the exercise * File >> Load session... >> Location the session file you just created >> OK * Magic!! The browser should then reload all tracks and information the way they were when you saved the session. # VCF files * File >> Load from URL >> http://hts-nonsecure.uio.no/course_vc_2012_nb3_imbv/exerResults/04_advancedPipelineWithFuncAnnot/snpsHQ.filt.vcf * File >> Load from URL >> http://hts-nonsecure.uio.no/course_vc_2012_nb3_imbv/exerResults/04_advancedPipelineWithFuncAnnot/indelsHQ.filt.vcf * The variant tracks appear at the top of the screen. * You should already be roughly in the area: 5:74,021,826-74,021,866 and you see how the reads were used to correctly call the deletion. * Navigate to: 5:55,086,316-55,087,002 * You should be able to see how we correctly predict this SNP # Regions of interest If you have time, see if you can figure out how to create your list of regions of interest using the functionality in the "Regions" menu and the "red" icon in the tool bar.