SMRT Analysis: The params.xml protocol file

From mn/ibv/bioinfwiki
Revision as of 15:09, 9 March 2015 by Ralfne@uio.no (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction

The protocol params.xml file specifies the algorithms (and their parameters) run in the smrtpipe pipeline. In addition to protocol files, the smrtanalysis package also contains related protocol template files. These are used in the smrtPortal web server to display a given protocol visually, giving the user the ability to change certain parameters. When submitting a job in the smrtPortal, protocol params.xml files are produced from the template files.

Protocol files

There are two main sections in this file: one <global> section and a number of <module> sections. Inside the <global> section, variables that apply for the whole smrtpipe run can be defined, such as for instance the job name, or a reference genome file. The <module> sections define the actual algorithms to be run. They will run in the order given in the params.xml file, and typically contain some parameters defined in <param> sections.

The following is an example of a short params.xml file. The “P_Fetch” is a parameter-less module included in all protocol files. It fetches the read files specified in the input.xml file. The “P_Filter” module filters these reads according to the given parameters:


<?xml version="1.0"?>

<smrtpipeSettings>

  <global>

    <param name="version">

      <value>3</value>

    </param>

  </global>

  <module name="P_Fetch">

  </module>

  <module name="P_Filter">

    <param name="minSubReadLength">

      <value>500</value>

    </param>

    <param name="minLength">

      <value>100</value>

    </param>

    <param name="readScore">

      <value>0.80</value>

    </param>

  </module>

</smrtpipeSettings>


The output of this params.xml file will include the filtered reads in fasta and fastq format (such as “/outputFolder/data/filtered_subreads.fasta”), in addition to a filtering report (“/outputFolder/data/filtered_summary.csv”). The smrtpipe user guide (https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Pipe-Reference-Guide-v2.3.0) contains details about all available module, such as their input and output. (Note, however, that the fasta/fastq output filenames are not given in this document; also, the filter summary filename is slightly mis-spelt…) 

Protocols template files

The template XML files resemble the protocol files, but the two sections in this file are a single <protocol> and multiple <moduleStage> sections. The <protocol> section defines the graphical user interfaces used to specify various parameters describing the run, such as the job name. For instance, the <param name=”name” label=”Protocol Name”> will cause the rendering of a textbox (inputType=”Text”) that accepts the input of text (as opposed to for instance numbers), with a “Protocol Name” label. The <moduleStage> sections define the algorithms available for the selected protocol template:


<?xml version="1.0" encoding="utf-8"?><smrtpipeSettings>

  <protocol id="RS_HGAP_Assembly.3" version="2.3.0" editable="false">

    <application>De novo assembly</application>

    <param name="name" label="Protocol Name">

      <value>RS_HGAP_Assembly</value>

      <input type="text"/>

     <rule required="true"/>

    </param>

    <param name="fetch" hidden="true">

      <value>common/protocols/preprocessing/Fetch.1.xml</value>

    </param>

    <param name="filtering">

      <value>common/protocols/filtering/PreAssemblerSFilter.1.xml</value>

      <select multiple="true">

        <import contentType="text/directory" extension="xml">common/protocols/filtering</import>

      </select>

    </param>

  </protocol>

  <moduleStage name="fetch" editable="false"/>

  <moduleStage name="filtering" editable="true"/>

</smrtpipeSettings>

Obtaining protocol files

PacBio clearly intends its customers to use the smrtpipe through the graphical user interface provided by the smrtPortal. Since this web server is only available for download by direct customers of PacBio, often users will not have access to it. At UoO, the smrtPortal is available for only for members of CEES, and even they need to copy the smrtPortal-generated params.xml files onto Abel, manually starting the smrtpipe with the correct input.xml and params.xml files.

If unable to use the smrtPortal, several other options remain. An important source of protocol files is the example folder that is part of the smrtanalysis installation:

 cluster/software/VERSIONS/smrtanalysis-2.3.0/doc/examples/

Located in subfolders, several example params.xml files are available, often with instructive comments about parameter settings, module alternatives etc. The users may simple copy fitting example params.xml to a local folder, subsequently changing the relevant parameters.

Also, the smrtpipe user guide contains examples of params.xml files (or parts thereof):

https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Pipe-Reference-Guide-v2.3.0


Creating protocol files from template files 

Finally, the protocol template files contain the information necessary to create valid params.xml files. These template files are located at 

/cluster/software/VERSIONS/smrtanalysis-2.3.0/common/protocols/

The key to creating params.xml files from template files is the replacement of the <moduleStage> section with a fitting <module> section. These sections are found in locations specified in the <protocol> section of the template file.

For instance, the <moduleStage name="filtering" editable="true"/> section is referred to in the

<param name=”filtering”> section further up in the template file. Here, the <value> section points to the path “common/protocols/filtering/PreAssemblerSFilter.1.xml”. Thus, we will have to replace the text  “<moduleStage name="filtering" editable="true"/>“ with the content of the file

“/cluster/software/VERSIONS/smrtanalysis-2.3.0/common/protocols/filtering/PreAssemblerSFilter.1.xml”. However, we need to exempt the “<?xml version="1.0" ?>”, “<smrtpipeSettings>” and “</smrtpipeSettings>” tags from the copying, since these tags already are defined in the template file. Also, for clarity we only need to retain the <params> sections with their <value> tags; other tags are used for rendering the GUI in the smrtPortal and not used by smrtpipe. Finally, we remove the <param name=”filtering”> section. Doing the same with the  <moduleStage name="fetch" editable="false"/> section gives us:


<?xml version="1.0" encoding="utf-8"?><smrtpipeSettings>

  <protocol id="RS_HGAP_Assembly.3" version="2.3.0" editable="false">

    <application>De novo assembly</application>

    <param name="name" label="Protocol Name">

      <value>RS_HGAP_Assembly</value>

      <input type="text"/>

     <rule required="true"/>

    </param>

  </protocol>

  <module id="P_Fetch" label="Fetch v1" editableInJob="true" >

  </module>

  <module id="P_Filter" label="PreAssembler Filter v1" editableInJob="true" >

    <param name="minSubReadLength" label="Minimum Subread Length">

      <value>500</value>

    </param>

    <param name="readScore" label="Minimum Polymerase Read Quality">

      <value>0.80</value>

    </param>

    <param name="minLength" label="Minimum Polymerase Read Length">

      <value>100</value>

    </param>

  </module>

  <module id="P_FilterReports" label="PreAssemblerSFilter Reports v1" />

</smrtpipeSettings>


If we now remove the whole <protocol> section (from “<protocol>” to “</protocol>”), and insert a <global> section instead, we are essentially left with the params.xml file given above.

Creating params.xml file from template files will take some trail-and-error; the procedures given here are probably not always sufficient. It is advisable to try to obtain valid params.xml files by other means before creating them themselves!