PBcR: Output folder structure

From mn/ibv/bioinfwiki
Jump to: navigation, search

PBcR will write its output files and folders relative to the current folder (i.e. relative to the folder from where it was started). This folder will here be called the <output> folder. If starting PBcR from this folder, the master log file will reside directly under this folder (provided the log file is created as shown here). PBcR uses the <libraryname> (specified with the -l option) as part of various file and folder names.

The temporay folder

Directly under the output folder a temporary folder is created:

<output>/temp<libraryname>

Corrected reads

The corrected reads are found under the <output> folder, pre-fixed with the <libraryname>:

<libraryname>.fasta /.fastq contain the corrected reads in fasta/fastq format

<libraryname>.longest25.fastq contains the longest 25X subset of the corrected reads

<libraryname>.correction.hist contains histogram information for the read correction

The Celera Assembler (CA) output

The output of the assembly step in the PBcR pipeline is written by the CA program. This is located at <output>/<libraryname>. The CA output consists of 9 folders, reflecting the progress of the program. Thus, the most interesting folder is the 9-terminator folder which holds the final assembly:

<output>/<libraryname>/9-terminator

This folder contains the final assembly in the asm format, which is the native CA format. The asm.ctg.fasta file contains the assembly in fasta format. The asm.qc file contains the corresponding assembly quality information.