Difference between revisions of "Variant calling on Abel"
(Created page with "==Software installation== Programs that are used by a large number of users at CEES are generally maintained by our IT support, who also creates 'modules' for each program ins...") |
|||
Line 1: | Line 1: | ||
− | == | + | ==Intro== |
− | + | This site describes a number of shell scripts that we have developed to run a SNP calling pipeline for population-level resequencing data, which includes [http://www.broadinstitute.org/gatk/ GATK]'s HaplotypeCaller, [https://github.com/ekg/freebayes FreeBayes], and [http://samtools.sourceforge.net Samtools]' [http://samtools.sourceforge.net/mpileup.shtml mpileup]. The pipeline is tailored for use with Abel in the sense that it triggers a large number of slurm scripts in parallel in order to efficiently use Abel's parallel computing power. The aim is to be able to start the entire pipeline with just a number of clicks, but this is still work in progress. Here's what we've got so far. | |
− | + | ||
− | + | ==Preparation== | |
− | <pre> | + | In order to allow the greatest extent of automatization, the pipeline expects standardized variable names for input sequence files according to the format SAMPLE_LIBRARY_REP.fastq.gz, where 'SAMPLE' should be an identifier for the sampled individual, 'LIBRARY' should be an identifier for the DNA library extracted from that individual, and REP should be an identifier for the mate pair replicate, if applicable. All fastq.gz files should sit in the same directory somewhere on /work/users/. For example, /work/users/michaelm/aqua_genome/Working/analysis/data/ looks like this |
− | + | <pre> | |
− | + | ls -l /work/users/michaelm/aqua_genome/Working/analysis/data/ | |
− | + | -rwx------ 1 michaelm users 1855831335 Jul 30 13:47 L01Y007_L001_R1.fastq.gz | |
+ | -rwx------ 1 michaelm users 1856424663 Jul 30 13:47 L01Y007_L001_R2.fastq.gz | ||
+ | -rwx------ 1 michaelm users 1837673868 Jul 30 13:47 L01Y007_L002_R1.fastq.gz | ||
+ | -rwx------ 1 michaelm users 1839282994 Jul 30 13:47 L01Y007_L002_R2.fastq.gz | ||
+ | -rwx------ 1 michaelm users 1277786926 Jul 30 13:47 L01Y009_L001_R1.fastq.gz | ||
+ | -rwx------ 1 michaelm users 1278239970 Jul 30 13:47 L01Y009_L001_R2.fastq.gz | ||
+ | -rwx------ 1 michaelm users 1266669523 Jul 30 13:47 L01Y009_L002_R1.fastq.gz | ||
+ | -rwx------ 1 michaelm users 1267661234 Jul 30 13:47 L01Y009_L002_R2.fastq.gz | ||
+ | -rwx------ 1 michaelm users 1397131138 Jul 30 13:47 L01Y013_L001_R1.fastq.gz | ||
+ | -rwx------ 1 michaelm users 1397796506 Jul 30 13:47 L01Y013_L001_R2.fastq.gz | ||
+ | ... | ||
+ | </pre> |
Revision as of 19:24, 31 July 2014
Intro
This site describes a number of shell scripts that we have developed to run a SNP calling pipeline for population-level resequencing data, which includes GATK's HaplotypeCaller, FreeBayes, and Samtools' mpileup. The pipeline is tailored for use with Abel in the sense that it triggers a large number of slurm scripts in parallel in order to efficiently use Abel's parallel computing power. The aim is to be able to start the entire pipeline with just a number of clicks, but this is still work in progress. Here's what we've got so far.
Preparation
In order to allow the greatest extent of automatization, the pipeline expects standardized variable names for input sequence files according to the format SAMPLE_LIBRARY_REP.fastq.gz, where 'SAMPLE' should be an identifier for the sampled individual, 'LIBRARY' should be an identifier for the DNA library extracted from that individual, and REP should be an identifier for the mate pair replicate, if applicable. All fastq.gz files should sit in the same directory somewhere on /work/users/. For example, /work/users/michaelm/aqua_genome/Working/analysis/data/ looks like this
ls -l /work/users/michaelm/aqua_genome/Working/analysis/data/ -rwx------ 1 michaelm users 1855831335 Jul 30 13:47 L01Y007_L001_R1.fastq.gz -rwx------ 1 michaelm users 1856424663 Jul 30 13:47 L01Y007_L001_R2.fastq.gz -rwx------ 1 michaelm users 1837673868 Jul 30 13:47 L01Y007_L002_R1.fastq.gz -rwx------ 1 michaelm users 1839282994 Jul 30 13:47 L01Y007_L002_R2.fastq.gz -rwx------ 1 michaelm users 1277786926 Jul 30 13:47 L01Y009_L001_R1.fastq.gz -rwx------ 1 michaelm users 1278239970 Jul 30 13:47 L01Y009_L001_R2.fastq.gz -rwx------ 1 michaelm users 1266669523 Jul 30 13:47 L01Y009_L002_R1.fastq.gz -rwx------ 1 michaelm users 1267661234 Jul 30 13:47 L01Y009_L002_R2.fastq.gz -rwx------ 1 michaelm users 1397131138 Jul 30 13:47 L01Y013_L001_R1.fastq.gz -rwx------ 1 michaelm users 1397796506 Jul 30 13:47 L01Y013_L001_R2.fastq.gz ...