Variant calling on Abel

From mn/bio/cees-bioinf
Revision as of 19:26, 31 July 2014 by Michamat@uio.no (talk | contribs) (Preparation)

Jump to: navigation, search

Intro

This site describes a number of shell scripts that we have developed to run a SNP calling pipeline for population-level resequencing data, which includes GATK's HaplotypeCaller, FreeBayes, and Samtools' mpileup. The pipeline is tailored for use with Abel in the sense that it triggers a large number of slurm scripts in parallel in order to efficiently use Abel's parallel computing power. The aim is to be able to start the entire pipeline with just a number of clicks, but this is still work in progress. Here's what we've got so far.

Preparation

In order to allow the greatest extent of automatization, the pipeline expects standardized variable names for input sequence files according to the format SAMPLE_LIBRARY_REP.fastq.gz, where 'SAMPLE' should be an identifier for the sampled individual, 'LIBRARY' should be an identifier for the DNA library extracted from that individual, and REP should be an identifier for the mate pair replicate, if applicable. These three identifiers should be separated with underscores, and there should be no additional underscores in the filename (but can be in the path). All fastq.gz files should sit in the same directory somewhere on /work/users/. For example, /work/users/michaelm/aqua_genome/Working/analysis/data/ looks like this

ls -l /work/users/michaelm/aqua_genome/Working/analysis/data/
-rwx------ 1 michaelm users 1855831335 Jul 30 13:47 L01Y007_L001_R1.fastq.gz
-rwx------ 1 michaelm users 1856424663 Jul 30 13:47 L01Y007_L001_R2.fastq.gz
-rwx------ 1 michaelm users 1837673868 Jul 30 13:47 L01Y007_L002_R1.fastq.gz
-rwx------ 1 michaelm users 1839282994 Jul 30 13:47 L01Y007_L002_R2.fastq.gz
-rwx------ 1 michaelm users 1277786926 Jul 30 13:47 L01Y009_L001_R1.fastq.gz
-rwx------ 1 michaelm users 1278239970 Jul 30 13:47 L01Y009_L001_R2.fastq.gz
-rwx------ 1 michaelm users 1266669523 Jul 30 13:47 L01Y009_L002_R1.fastq.gz
-rwx------ 1 michaelm users 1267661234 Jul 30 13:47 L01Y009_L002_R2.fastq.gz
-rwx------ 1 michaelm users 1397131138 Jul 30 13:47 L01Y013_L001_R1.fastq.gz
-rwx------ 1 michaelm users 1397796506 Jul 30 13:47 L01Y013_L001_R2.fastq.gz
...