Difference between revisions of "SMRT Analysis: The params.xml protocol file"

From mn/ibv/bioinfwiki
Jump to: navigation, search
 
(9 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
= Protocol files =
 
= Protocol files =
  
There are two main sections in this file: one <global> section and a number of <module> sections. Inside the <global> section, variables that apply for the whole smrtpipe run can be defined, such as for instance the job name, or a reference genome file. The <module> sections define the actual algorithms to be run. They will run in the order given in the params.xml file, and typically contain some parameters defined in <param> sections.
+
There are two main sections in this file: one <span style="font-family:courier new,courier,monospace;">&lt;global&gt;</span> section and a number of <span style="font-family:courier new,courier,monospace;">&lt;module&gt;</span> sections. Inside the <span style="font-family:courier new,courier,monospace;">&lt;global&gt;</span> section, variables that apply for the whole smrtpipe run can be defined, such as for instance the job name, or a reference genome file. The <span style="font-family:courier new,courier,monospace;">&lt;module&gt;</span> sections define the actual algorithms to be run. They will run in the order given in the params.xml file, and typically contain some parameters defined in <span style="font-family:courier new,courier,monospace;">&lt;param&gt;</span> sections.
  
The following is an example of a short params.xml file. The “P_Fetch” is a parameter-less module included in all protocol files. It fetches the read files specified in the input.xml file. The “P_Filter” module filters these reads according to the given parameters.&nbsp;:
+
The following is an example of a short params.xml file. The “P_Fetch” is a parameter-less module included in all protocol files. It fetches the read files specified in the input.xml file. The “P_Filter” module filters these reads according to the given parameters:
  
  
 +
<div style="line-height:90%; background-color: LightGray; border-style: solid; border-width:1px; font-family:courier new,courier,monospace;">
 +
&lt;?xml version="1.0"?&gt;
  
<span style="font-family:courier new,courier,monospace;">&lt;?xml version="1.0"?&gt;</span>
+
&lt;smrtpipeSettings&gt;
  
<span style="font-family:courier new,courier,monospace;">&lt;smrtpipeSettings&gt;</span>
+
&nbsp; &lt;global&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp; &lt;global&gt;</span>
+
&nbsp;&nbsp;&nbsp; &lt;param name="version"&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp; &lt;param name="version"&gt;</span>
+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;3&lt;/value&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;3&lt;/value&gt;</span>
+
&nbsp;&nbsp;&nbsp; &lt;/param&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp; &lt;/param&gt;</span>
+
&nbsp; &lt;/global&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp; &lt;/global&gt;</span>
+
&nbsp; &lt;module name="P_Fetch"&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp; &lt;module name="P_Fetch"&gt;</span>
+
&nbsp; &lt;/module&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp; &lt;/module&gt;</span>
+
&nbsp; &lt;module name="P_Filter"&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp; &lt;module name="P_Filter"&gt;</span>
+
&nbsp;&nbsp;&nbsp; &lt;param name="minSubReadLength"&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp; &lt;param name="minSubReadLength"&gt;</span>
+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;500&lt;/value&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;500&lt;/value&gt;</span>
+
&nbsp;&nbsp;&nbsp; &lt;/param&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp; &lt;/param&gt;</span>
+
&nbsp;&nbsp;&nbsp; &lt;param name="minLength"&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp; &lt;param name="minLength"&gt;</span>
+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;100&lt;/value&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;100&lt;/value&gt;</span>
+
&nbsp;&nbsp;&nbsp; &lt;/param&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp; &lt;/param&gt;</span>
+
&nbsp;&nbsp;&nbsp; &lt;param name="readScore"&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp; &lt;param name="readScore"&gt;</span>
+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;0.80&lt;/value&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;0.80&lt;/value&gt;</span>
+
&nbsp;&nbsp;&nbsp; &lt;/param&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp; &lt;/param&gt;</span>
+
&nbsp; &lt;/module&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp; &lt;/module&gt;</span>
+
&lt;/smrtpipeSettings&gt;
 +
</div>
 +
<br/>The output of this params.xml file will include the filtered reads in fasta and fastq format (such as “/outputFolder/data/filtered_subreads.fasta”), in addition to a filtering report (“/outputFolder/data/filtered_summary.csv”). The smrtpipe user guide ([https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Pipe-Reference-Guide-v2.3.0 https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Pipe-Reference-Guide-v2.3.0]) contains details about all available module, such as their input and output. (Note, however, that the fasta/fastq output filenames are not given in this document; also, the filter summary filename is slightly mis-spelt…)&nbsp;
 +
 
 +
= Protocols template files =
 +
 
 +
The template XML files resemble the protocol files, but the two sections in this file are a single <span style="font-family:courier new,courier,monospace;">&lt;protocol&gt;</span> and multiple <span style="font-family:courier new,courier,monospace;">&lt;moduleStage&gt;</span> sections. The <span style="font-family:courier new,courier,monospace;">&lt;protocol&gt;</span> section defines the graphical user interfaces used to specify various parameters describing the run, such as the job name. For instance, the <span style="font-family:courier new,courier,monospace;">&lt;param name=”name” label=”Protocol Name”&gt;</span> will cause the rendering of a textbox (<span style="font-family:courier new,courier,monospace;">inputType=”Text”</span>) that accepts the input of text (as opposed to for instance numbers), with a “Protocol Name” label. The <span style="font-family:courier new,courier,monospace;">&lt;moduleStage&gt;</span> sections define the algorithms available for the selected protocol template:
  
<span style="font-family:courier new,courier,monospace;">&lt;/smrtpipeSettings&gt;</span>
 
  
<span style="font-family:courier new,courier,monospace;">&nbsp;</span>
+
<div style="line-height:90%; background-color: LightGray; border-style: solid; border-width:1px; font-family:courier new,courier,monospace;">
 +
&lt;?xml version="1.0" encoding="utf-8"?&gt;&lt;smrtpipeSettings&gt;
  
The output of this params.xml file will include the filtered reads in fasta and fastq format (such as “/outputFolder/data/filtered_subreads.fasta”), in addition to a filtering report (“/outputFolder/data/filtered_summary.csv”). The smrtpipe user guide ([https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Pipe-Reference-Guide-v2.3.0 https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Pipe-Reference-Guide-v2.3.0]) contains details about all available module, such as their input and output. (Note, however, that the fasta/fastq output filenames are not given in this document; also, the filter summary filename is slightly mis-spelt…)
+
&nbsp; &lt;protocol id="RS_HGAP_Assembly.3" version="2.3.0" editable="false"&gt;
  
&nbsp;
+
&nbsp;&nbsp;&nbsp; &lt;application&gt;De novo assembly&lt;/application&gt;
  
= Protocols template files =
+
&nbsp; &nbsp;&nbsp;&lt;param name="name" label="Protocol Name"&gt;
 +
 
 +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;RS_HGAP_Assembly&lt;/value&gt;
  
The template XML files resemble the protocol files, but the two sections in this file are a single &lt;protocol&gt; and multiple &lt;moduleStage&gt; sections. The &lt;protocol&gt; section defines the graphical user interfaces used to specify various parameters describing the run, such as the job name. For instance, the &lt;param name=”name” label=”Protocol Name”&gt; will cause the rendering of a textbox (inputType=”Text”) that accepts the input of text (as opposed to for instance numbers), with a “Protocol Name” label. The &lt;moduleStage&gt; sections define the algorithms available for the selected protocol template:
+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;input type="text"/&gt;
  
 +
&nbsp;&nbsp; &nbsp;&nbsp;&lt;rule required="true"/&gt;
  
 +
&nbsp;&nbsp;&nbsp; &lt;/param&gt;
  
<span style="font-family:courier new,courier,monospace;">&lt;?xml version="1.0" encoding="utf-8"?&gt;&lt;smrtpipeSettings&gt;</span>
+
&nbsp; &nbsp;&nbsp;&lt;param name="fetch" hidden="true"&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp; &lt;protocol id="RS_HGAP_Assembly.3" version="2.3.0" editable="false"&gt;</span>
+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;common/protocols/preprocessing/Fetch.1.xml&lt;/value&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp; &lt;application&gt;De novo assembly&lt;/application&gt;</span>
+
&nbsp;&nbsp;&nbsp; &lt;/param&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp; &nbsp;&nbsp;&lt;param name="name" label="Protocol Name"&gt;</span>
+
&nbsp; &nbsp;&nbsp;&lt;param name="filtering"&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;RS_HGAP_Assembly&lt;/value&gt;</span>
+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;common/protocols/filtering/PreAssemblerSFilter.1.xml&lt;/value&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;input type="text"/&gt;</span>
+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;select multiple="true"&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp; &nbsp;&nbsp;&lt;rule required="true"/&gt;</span>
+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;import contentType="text/directory" extension="xml"&gt;common/protocols/filtering&lt;/import&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp; &lt;/param&gt;</span>
+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/select&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp; &nbsp;&nbsp;&lt;param name="fetch" hidden="true"&gt;</span>
+
&nbsp;&nbsp;&nbsp; &lt;/param&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;common/protocols/preprocessing/Fetch.1.xml&lt;/value&gt;</span>
+
&nbsp; &lt;/protocol&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp; &lt;/param&gt;</span>
+
&nbsp; &lt;moduleStage name="fetch" editable="false"/&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp; &nbsp;&nbsp;&lt;param name="filtering"&gt;</span>
+
&nbsp; &lt;moduleStage name="filtering" editable="true"/&gt;
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;common/protocols/filtering/PreAssemblerSFilter.1.xml&lt;/value&gt;</span>
+
&lt;/smrtpipeSettings&gt;
 +
</div>
 +
= Obtaining protocol files =
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;select multiple="true"&gt;</span>
+
PacBio clearly intends its customers to use the smrtpipe through the graphical user interface provided by the smrtPortal. Since this web server is only available for download by direct customers of PacBio, often users will not have access to it. At UoO, the smrtPortal is available for only for members of CEES, and even they need to copy the smrtPortal-generated params.xml files onto Abel, manually starting the smrtpipe with the correct input.xml and params.xml files.
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;import contentType="text/directory" extension="xml"&gt;common/protocols/filtering&lt;/import&gt;</span>
+
If unable to use the smrtPortal, several other options remain. An important source of protocol files is the example folder that is part of the smrtanalysis installation:
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/select&gt;</span>
+
&nbsp;<span style="font-family:courier new,courier,monospace;">cluster/software/VERSIONS/smrtanalysis-2.3.0/doc/examples/</span>
  
<span style="font-family:courier new,courier,monospace;">&nbsp;&nbsp;&nbsp; &lt;/param&gt;</span>
+
Located in subfolders, several example params.xml files are available, often with instructive comments about parameter settings, module alternatives etc. The users may simple copy fitting example params.xml to a local folder, subsequently changing the relevant parameters.
  
<span style="font-family:courier new,courier,monospace;">&nbsp; &lt;/protocol&gt;</span>
+
Also, the smrtpipe user guide contains examples of params.xml files (or parts thereof):
  
<span style="font-family:courier new,courier,monospace;">&nbsp; &lt;moduleStage name="fetch" editable="false"/&gt;</span>
+
[https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Pipe-Reference-Guide-v2.3.0 https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Pipe-Reference-Guide-v2.3.][https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Pipe-Reference-Guide-v2.3.0 0]
  
<span style="font-family:courier new,courier,monospace;">&nbsp; &lt;moduleStage name="filtering" editable="true"/&gt;</span>
 
  
<span style="font-family:courier new,courier,monospace;">&lt;/smrtpipeSettings&gt;</span>
 
  
 +
== Creating protocol files from template files<span style="font-size: 12px;">&nbsp;</span> ==
  
 +
Finally, the protocol template files contain the information necessary to create valid params.xml files. These template files are located at&nbsp;
  
= Obtaining protocol files =
+
<span style="font-family:courier new,courier,monospace;">/cluster/software/VERSIONS/smrtanalysis-2.3.0/common/protocols/</span>
  
&nbsp;
+
The key to creating params.xml files from template files is the replacement of the &lt;moduleStage&gt; section with a fitting &lt;module&gt; section. These sections are found in locations specified in the &lt;protocol&gt; section of the template file.
  
PacBio clearly intends its customers to use the smrtpipe through the graphical user interface provided by the smrtPortal. Since this web server is only available for download by direct customers of PacBio, often users will not have access to it. At UoO, the smrtPortal is available for only for members of CEES, and even they need to copy the smrtPortal-generated params.xml files onto Abel, manually starting the smrtpipe with the correct input.xml and params.xml files.
+
For instance, the <span style="font-family:courier new,courier,monospace;">&lt;moduleStage name="filtering" editable="true"/&gt;</span> section is referred to in the
  
&nbsp;
+
<span style="font-family:courier new,courier,monospace;">&lt;param name=”filtering”&gt;</span> section further up in the template file. Here, the <span style="font-family:courier new,courier,monospace;">&lt;value&gt; </span>section points to the path “common/protocols/filtering/PreAssemblerSFilter.1.xml”. Thus, we will have to replace the text&nbsp; “<span style="font-family:courier new,courier,monospace;">&lt;moduleStage name="filtering" editable="true"/&gt;</span>“ with the content of the file
  
If unable to use the smrtPortal, several other options remain. An important source of protocol files is the example folder that is part of the smrtanalysis installation:
+
“/cluster/software/VERSIONS/smrtanalysis-2.3.0/common/protocols/filtering/PreAssemblerSFilter.1.xml”. However, we need to exempt the “<span style="font-family:courier new,courier,monospace;">&lt;?xml version="1.0"&nbsp;?&gt;</span>”, “<span style="font-family:courier new,courier,monospace;">&lt;smrtpipeSettings&gt;</span>” and “<span style="font-family:courier new,courier,monospace;">&lt;/smrtpipeSettings&gt;</span>” tags from the copying, since these tags already are defined in the template file. Also, for clarity we only need to retain the <span style="font-family:courier new,courier,monospace;">&lt;params&gt;</span> sections with their <span style="font-family:courier new,courier,monospace;">&lt;value&gt;</span> tags; other tags are used for rendering the GUI in the smrtPortal and not used by smrtpipe. Finally, we remove the <span style="font-family:courier new,courier,monospace;">&lt;param name=”filtering”&gt;</span> section. Doing the same with the &nbsp;<span style="font-family:courier new,courier,monospace;">&lt;moduleStage name="fetch" editable="false"/&gt;</span> section gives us:
  
&nbsp;
 
  
cluster/software/VERSIONS/smrtanalysis-2.3.0/doc/examples/
+
<div style="line-height:90%; background-color: LightGray; border-style: solid; border-width:1px; font-family:courier new,courier,monospace;">
 +
&lt;?xml version="1.0" encoding="utf-8"?&gt;&lt;smrtpipeSettings&gt;
  
&nbsp;
+
&nbsp; &lt;protocol id="RS_HGAP_Assembly.3" version="2.3.0" editable="false"&gt;
  
Located in subfolders, several example params.xml files are available, often with instructive comments about parameter settings, module alternatives etc. The users may simple copy a fitting example params.xml to a local folder, subsequently changing the relevant parameters.
+
&nbsp;&nbsp;&nbsp; &lt;application&gt;De novo assembly&lt;/application&gt;
  
&nbsp;
+
&nbsp; &nbsp;&nbsp;&lt;param name="name" label="Protocol Name"&gt;
  
Also, the smrtpipe user guide contains examples of params.xml files (or parts thereof):
+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;RS_HGAP_Assembly&lt;/value&gt;
  
&nbsp;
+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;input type="text"/&gt;
  
[https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Pipe-Reference-Guide-v2.3.0 https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Pipe-Reference-Guide-v2.3.0]
+
&nbsp;&nbsp; &nbsp;&nbsp;&lt;rule required="true"/&gt;
  
&nbsp;
+
&nbsp;&nbsp;&nbsp; &lt;/param&gt;
  
== Creating protocol files from template files ==
+
&nbsp; &lt;/protocol&gt;
  
&nbsp;
+
&nbsp; &lt;module id="P_Fetch" label="Fetch v1" editableInJob="true" &gt;
  
Finally, the protocol template files contain the information necessary to create valid params.xml files. These template files are located at
+
&nbsp; &lt;/module&gt;
  
&nbsp;
+
&nbsp; &lt;module id="P_Filter" label="PreAssembler Filter v1" editableInJob="true" &gt;
  
/cluster/software/VERSIONS/smrtanalysis-2.3.0/common/protocols/
+
&nbsp;&nbsp;&nbsp; &lt;param name="minSubReadLength" label="Minimum Subread Length"&gt;
  
&nbsp;
+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;500&lt;/value&gt;
  
The key to creating params.xml files from template files is the replacement of the &lt;moduleStage&gt; section with a fitting &lt;module&gt; section. These sections are found in locations specified in the &lt;protocol&gt; section of the template file.
+
&nbsp;&nbsp;&nbsp; &lt;/param&gt;
  
For instance, the &lt;moduleStage name="filtering" editable="true"/&gt; section is referred to in the
+
&nbsp;&nbsp;&nbsp; &lt;param name="readScore" label="Minimum Polymerase Read Quality"&gt;
  
&lt;param name=”filtering”&gt; section further up in the template file. Here, the &lt;value&gt; section points to the path “common/protocols/filtering/PreAssemblerSFilter.1.xml”. Thus, we will have to replace the text&nbsp; “&lt;moduleStage name="filtering" editable="true"/&gt;“ with the content of the file
+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;0.80&lt;/value&gt;
  
“/cluster/software/VERSIONS/smrtanalysis-2.3.0/common/protocols/filtering/PreAssemblerSFilter.1.xml”. However, we need to exempt the “&lt;?xml version="1.0"&nbsp;?&gt;”, “&lt;smrtpipeSettings&gt;” and “&lt;/smrtpipeSettings&gt;” tags from the copying, since these tags already are defined in the template file. Also, for clarity we only need to retain the &lt;params&gt; sections with their &lt;value&gt; tags; other tags are used for rendering the GUI in the smrtPortal and not used by smrtpipe. Also, we remove the &lt;param name=”filtering”&gt; section. Doing the same with the &nbsp;&lt;moduleStage name="fetch" editable="false"/&gt; section gives us:
+
&nbsp;&nbsp;&nbsp; &lt;/param&gt;
  
&lt;?xml version="1.0" encoding="utf-8"?&gt;&lt;smrtpipeSettings&gt;
+
&nbsp;&nbsp;&nbsp; &lt;param name="minLength" label="Minimum Polymerase Read Length"&gt;
  
  <protocol id="RS_HGAP_Assembly.3" version="2.3.0" editable="false">
+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;value&gt;100&lt;/value&gt;
  <application>De novo assembly</application>
 
  <param name="name" label="Protocol Name">
 
    <value>RS_HGAP_Assembly</value>
 
    <input type="text"/>
 
    <rule required="true"/>
 
  </param>
 
</protocol>
 
<module id="P_Fetch" label="Fetch v1" editableInJob="true" >
 
</module>
 
<module id="P_Filter" label="PreAssembler Filter v1" editableInJob="true" >
 
  <param name="minSubReadLength" label="Minimum Subread Length">
 
    <value>500</value>
 
  </param>
 
  <param name="readScore" label="Minimum Polymerase Read Quality">
 
    <value>0.80</value>
 
  </param>
 
  <param name="minLength" label="Minimum Polymerase Read Length">
 
    <value>100</value>
 
  </param>
 
</module>
 
<module id="P_FilterReports" label="PreAssemblerSFilter Reports v1" />
 
  
&lt;/smrtpipeSettings&gt;
+
&nbsp;&nbsp;&nbsp; &lt;/param&gt;
  
 +
&nbsp; &lt;/module&gt;
  
 +
&nbsp; &lt;module id="P_FilterReports" label="PreAssemblerSFilter Reports v1" /&gt;
  
If we now remove the whole &lt;protocol&gt; section (from “&lt;protocol&gt;” to “&lt;/protocol&gt;”), and insert a &lt;global&gt; section instead, we are essentially left with the params.xml file given above.
+
&lt;/smrtpipeSettings&gt;
 +
</div>
 +
<br/>If we now remove the whole <span style="font-family:courier new,courier,monospace;">&lt;protocol&gt;</span> section (from “<span style="font-family:courier new,courier,monospace;">&lt;protocol&gt;</span>” to “<span style="font-family:courier new,courier,monospace;">&lt;/protocol&gt;</span>”), and insert a <span style="font-family:courier new,courier,monospace;">&lt;global&gt;</span> section instead, we are essentially left with the params.xml file given above.
  
 
Creating params.xml file from template files will take some trail-and-error; the procedures given here are probably not always sufficient. It is advisable to try to obtain valid params.xml files by other means before creating them themselves!
 
Creating params.xml file from template files will take some trail-and-error; the procedures given here are probably not always sufficient. It is advisable to try to obtain valid params.xml files by other means before creating them themselves!

Latest revision as of 16:09, 9 March 2015

Introduction

The protocol params.xml file specifies the algorithms (and their parameters) run in the smrtpipe pipeline. In addition to protocol files, the smrtanalysis package also contains related protocol template files. These are used in the smrtPortal web server to display a given protocol visually, giving the user the ability to change certain parameters. When submitting a job in the smrtPortal, protocol params.xml files are produced from the template files.

Protocol files

There are two main sections in this file: one <global> section and a number of <module> sections. Inside the <global> section, variables that apply for the whole smrtpipe run can be defined, such as for instance the job name, or a reference genome file. The <module> sections define the actual algorithms to be run. They will run in the order given in the params.xml file, and typically contain some parameters defined in <param> sections.

The following is an example of a short params.xml file. The “P_Fetch” is a parameter-less module included in all protocol files. It fetches the read files specified in the input.xml file. The “P_Filter” module filters these reads according to the given parameters:


<?xml version="1.0"?>

<smrtpipeSettings>

  <global>

    <param name="version">

      <value>3</value>

    </param>

  </global>

  <module name="P_Fetch">

  </module>

  <module name="P_Filter">

    <param name="minSubReadLength">

      <value>500</value>

    </param>

    <param name="minLength">

      <value>100</value>

    </param>

    <param name="readScore">

      <value>0.80</value>

    </param>

  </module>

</smrtpipeSettings>


The output of this params.xml file will include the filtered reads in fasta and fastq format (such as “/outputFolder/data/filtered_subreads.fasta”), in addition to a filtering report (“/outputFolder/data/filtered_summary.csv”). The smrtpipe user guide (https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Pipe-Reference-Guide-v2.3.0) contains details about all available module, such as their input and output. (Note, however, that the fasta/fastq output filenames are not given in this document; also, the filter summary filename is slightly mis-spelt…) 

Protocols template files

The template XML files resemble the protocol files, but the two sections in this file are a single <protocol> and multiple <moduleStage> sections. The <protocol> section defines the graphical user interfaces used to specify various parameters describing the run, such as the job name. For instance, the <param name=”name” label=”Protocol Name”> will cause the rendering of a textbox (inputType=”Text”) that accepts the input of text (as opposed to for instance numbers), with a “Protocol Name” label. The <moduleStage> sections define the algorithms available for the selected protocol template:


<?xml version="1.0" encoding="utf-8"?><smrtpipeSettings>

  <protocol id="RS_HGAP_Assembly.3" version="2.3.0" editable="false">

    <application>De novo assembly</application>

    <param name="name" label="Protocol Name">

      <value>RS_HGAP_Assembly</value>

      <input type="text"/>

     <rule required="true"/>

    </param>

    <param name="fetch" hidden="true">

      <value>common/protocols/preprocessing/Fetch.1.xml</value>

    </param>

    <param name="filtering">

      <value>common/protocols/filtering/PreAssemblerSFilter.1.xml</value>

      <select multiple="true">

        <import contentType="text/directory" extension="xml">common/protocols/filtering</import>

      </select>

    </param>

  </protocol>

  <moduleStage name="fetch" editable="false"/>

  <moduleStage name="filtering" editable="true"/>

</smrtpipeSettings>

Obtaining protocol files

PacBio clearly intends its customers to use the smrtpipe through the graphical user interface provided by the smrtPortal. Since this web server is only available for download by direct customers of PacBio, often users will not have access to it. At UoO, the smrtPortal is available for only for members of CEES, and even they need to copy the smrtPortal-generated params.xml files onto Abel, manually starting the smrtpipe with the correct input.xml and params.xml files.

If unable to use the smrtPortal, several other options remain. An important source of protocol files is the example folder that is part of the smrtanalysis installation:

 cluster/software/VERSIONS/smrtanalysis-2.3.0/doc/examples/

Located in subfolders, several example params.xml files are available, often with instructive comments about parameter settings, module alternatives etc. The users may simple copy fitting example params.xml to a local folder, subsequently changing the relevant parameters.

Also, the smrtpipe user guide contains examples of params.xml files (or parts thereof):

https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Pipe-Reference-Guide-v2.3.0


Creating protocol files from template files 

Finally, the protocol template files contain the information necessary to create valid params.xml files. These template files are located at 

/cluster/software/VERSIONS/smrtanalysis-2.3.0/common/protocols/

The key to creating params.xml files from template files is the replacement of the <moduleStage> section with a fitting <module> section. These sections are found in locations specified in the <protocol> section of the template file.

For instance, the <moduleStage name="filtering" editable="true"/> section is referred to in the

<param name=”filtering”> section further up in the template file. Here, the <value> section points to the path “common/protocols/filtering/PreAssemblerSFilter.1.xml”. Thus, we will have to replace the text  “<moduleStage name="filtering" editable="true"/>“ with the content of the file

“/cluster/software/VERSIONS/smrtanalysis-2.3.0/common/protocols/filtering/PreAssemblerSFilter.1.xml”. However, we need to exempt the “<?xml version="1.0" ?>”, “<smrtpipeSettings>” and “</smrtpipeSettings>” tags from the copying, since these tags already are defined in the template file. Also, for clarity we only need to retain the <params> sections with their <value> tags; other tags are used for rendering the GUI in the smrtPortal and not used by smrtpipe. Finally, we remove the <param name=”filtering”> section. Doing the same with the  <moduleStage name="fetch" editable="false"/> section gives us:


<?xml version="1.0" encoding="utf-8"?><smrtpipeSettings>

  <protocol id="RS_HGAP_Assembly.3" version="2.3.0" editable="false">

    <application>De novo assembly</application>

    <param name="name" label="Protocol Name">

      <value>RS_HGAP_Assembly</value>

      <input type="text"/>

     <rule required="true"/>

    </param>

  </protocol>

  <module id="P_Fetch" label="Fetch v1" editableInJob="true" >

  </module>

  <module id="P_Filter" label="PreAssembler Filter v1" editableInJob="true" >

    <param name="minSubReadLength" label="Minimum Subread Length">

      <value>500</value>

    </param>

    <param name="readScore" label="Minimum Polymerase Read Quality">

      <value>0.80</value>

    </param>

    <param name="minLength" label="Minimum Polymerase Read Length">

      <value>100</value>

    </param>

  </module>

  <module id="P_FilterReports" label="PreAssemblerSFilter Reports v1" />

</smrtpipeSettings>


If we now remove the whole <protocol> section (from “<protocol>” to “</protocol>”), and insert a <global> section instead, we are essentially left with the params.xml file given above.

Creating params.xml file from template files will take some trail-and-error; the procedures given here are probably not always sufficient. It is advisable to try to obtain valid params.xml files by other means before creating them themselves!