Difference between revisions of "PBcR: Output folder structure"

From mn/ibv/bioinfwiki
Jump to: navigation, search
(Created page with "Output folder structure")
 
Line 1: Line 1:
Output folder structure
+
PBcR will write its output files and folders relative to the current folder (i.e. relative to the folder from where it was started). This folder will here be called the <span style="font-family:courier new,courier,monospace;">&lt;output&gt;</span> folder. If starting PBcR from this folder, the master log file will reside directly under this folder (provided the log file is created as shown here). PBcR uses the <span style="font-family:courier new,courier,monospace;">&lt;libraryname&gt;</span> (specified with the <span style="font-family:courier new,courier,monospace;">-l</span> option) as part of various file and folder names.
 +
 
 +
= The temporay folder  =
 +
 
 +
Directly under the output folder a temporary folder is created:
 +
 
 +
<span style="font-family:courier new,courier,monospace;">&lt;output&gt;/temp&lt;libraryname&gt;</span>
 +
 
 +
= Corrected reads  =
 +
 
 +
The corrected reads are found under the <span style="font-family:courier new,courier,monospace;">&lt;output&gt;</span> folder, pre-fixed with the <span style="font-family:courier new,courier,monospace;">&lt;libraryname&gt;</span>:
 +
 
 +
<span style="font-family:courier new,courier,monospace;">&lt;libraryname&gt;.fasta /.fastq</span> contain the corrected reads in fasta/fastq format
 +
 
 +
<span style="font-family:courier new,courier,monospace;">&lt;libraryname&gt;.longest25.fastq</span> contains the longest 25X subset of the corrected reads
 +
 
 +
<span style="font-family:courier new,courier,monospace;">&lt;libraryname&gt;.correction.hist</span> contains histogram information for the read correction
 +
 
 +
= The Celera Assembler (CA) output  =
 +
 
 +
The output of the assembly step in the PBcR pipeline is written by the CA program. This is located at <span style="font-family:courier new,courier,monospace;">&lt;output&gt;/&lt;libraryname&gt;</span> The CA output consists of 9 folders, reflecting the progress of the program. Thus, the most interesting folder is the <span style="font-family:courier new,courier,monospace;">9-terminator</span> folder which holds the final assembly:
 +
 
 +
<span style="font-family:courier new,courier,monospace;">&lt;output&gt;/&lt;libraryname&gt;/9-terminator</span>
 +
 
 +
This folder contains the final assembly in the <span style="font-family:courier new,courier,monospace;">asm</span> format, which is the native CA format. The <span style="font-family:courier new,courier,monospace;">asm.ctg.fasta</span> file contains the assembly in fasta format. The <span style="font-family:courier new,courier,monospace;">asm.qc</span> file contains the corresponding assembly quality information.

Revision as of 15:04, 4 May 2015

PBcR will write its output files and folders relative to the current folder (i.e. relative to the folder from where it was started). This folder will here be called the <output> folder. If starting PBcR from this folder, the master log file will reside directly under this folder (provided the log file is created as shown here). PBcR uses the <libraryname> (specified with the -l option) as part of various file and folder names.

The temporay folder

Directly under the output folder a temporary folder is created:

<output>/temp<libraryname>

Corrected reads

The corrected reads are found under the <output> folder, pre-fixed with the <libraryname>:

<libraryname>.fasta /.fastq contain the corrected reads in fasta/fastq format

<libraryname>.longest25.fastq contains the longest 25X subset of the corrected reads

<libraryname>.correction.hist contains histogram information for the read correction

The Celera Assembler (CA) output

The output of the assembly step in the PBcR pipeline is written by the CA program. This is located at <output>/<libraryname> The CA output consists of 9 folders, reflecting the progress of the program. Thus, the most interesting folder is the 9-terminator folder which holds the final assembly:

<output>/<libraryname>/9-terminator

This folder contains the final assembly in the asm format, which is the native CA format. The asm.ctg.fasta file contains the assembly in fasta format. The asm.qc file contains the corresponding assembly quality information.