Difference between revisions of "Error correction of Illumina reads using Celera Assembler"

From mn/bio/cees-bioinf
Jump to: navigation, search
m
Line 1: Line 1:
It is usually useful to preprocess the data you have before assembly. There can be several issues with a dataset:
+
It is usually useful to pre-process the data you have before assembly. An ideal dataset would be error free, contain only genomic sequences, and no artificial duplications or chimeric sequences. Removing these before assembly usually give a better assembly than not removing them, and the assembly process itself often runs quicker.
 +
 
 +
Some assemblers, like ALLPATHS-LG, is mostly self-contained, and does not need any pre-processing of the data. Celera Assembler is also mostly self-contained, but it is more flexible regarding which modules to run when, and you can often pre-process the data however you want it before running an assembly on it.
 +
 
 +
For this how-to, I chose part of the dataset for the budgerigar that was used in the [Assemblathon 2]http://assemblathon.org/AS2_download.
 +
 
  
- adapter sequences 
 
- errors 
 
- duplicated reads 
 
  
In this how-to, we will remove adapter sequences and errors from a large Illumina dataset.
 
  
 
WORK IN PROGRESS.
 
WORK IN PROGRESS.

Revision as of 15:04, 13 June 2013

It is usually useful to pre-process the data you have before assembly. An ideal dataset would be error free, contain only genomic sequences, and no artificial duplications or chimeric sequences. Removing these before assembly usually give a better assembly than not removing them, and the assembly process itself often runs quicker.

Some assemblers, like ALLPATHS-LG, is mostly self-contained, and does not need any pre-processing of the data. Celera Assembler is also mostly self-contained, but it is more flexible regarding which modules to run when, and you can often pre-process the data however you want it before running an assembly on it.

For this how-to, I chose part of the dataset for the budgerigar that was used in the [Assemblathon 2]http://assemblathon.org/AS2_download.



WORK IN PROGRESS.