Software and modules

From mn/bio/cees-bioinf
Jump to: navigation, search

Software installation

Programs that are used by a large number of users at CEES are generally maintained by our IT support, who also creates 'modules' for each program installed in /cluster/software. On abel and the cod nodes, the standard way to use software is then to load them as modules with e.g.

module load samtools

The module system allows for different versions of the software. For example, writing

module avail samtools

will list the available versions:

samtools/0.1.18(default)
samtools/0.1.19

So you can then choose to overwrite the default version like this:

module load samtools/0.1.19

A good explanation on how to use modules can be found here.

Occasionally, however, you might need to use software that is not already available on abel, and that might only be used by you alone, or a small group of researchers. In that case, you may not want to bother IT support with the installation and maintenance of that software, but you could install it yourself in the dedicated directory, /projects/cees/bin/ (see here).

Modules

In order to keep /projects/cees/bin/ as organized as possible, it is recommended that programs are installed in sub-directories according to program name and version number, e.g. in /projects/cees/bin/freebayes/0.9.14/.

If software is installed this way in /projects/cees/bin/, it is still possible (and recommended) to use modules, as these will facilitate the use of the software in SLURM scripts and increase reproducibility. In order to use modules, module files must be written and saved in directory /projects/cees/bin/modules/, again using the the program name as the name of a sub-directory, as in /projects/cees/bin/modules/freebayes/. The module file should be named according to the version number of the program, without file extension. Thus, the file /projects/cees/bin/modules/freebayes/0.9.14 is a module file for the use of version 0.9.14 of the program FreeBayes. The content of the module file links this module with the installed software and is written in a language called TCL. However, commands in the module file are simple to understand, and existing module files can easily be copied and adapted for new module files (there is also a template module file in /projects/cees/bin/modules/ that you may copy). For example, the content of module file /projects/cees/bin/modules/freebayes/0.9.14 is as follows:

#%Module

## URL of application homepage.
set appurl     https://github.com/ekg/freebayes/

## Short description of module.
module-whatis "
Name:          FreeBayes
Description:   Bayesian haplotype-based polymorphism discovery and genotyping
Website:       https://github.com/ekg/freebayes/
Installed by:  Michael Matschiner"

## Commands.
set               root                 /projects/cees/bin/freebayes/0.9.14/
prepend-path      PATH                 $root

Here, the first line

#%Module

is required to make this file interpretable as a module file. All lines starting with ## are comments, and specification of the program URL in

set appurl     https://github.com/ekg/freebayes/

only serves as an information for the user. The command

module-whatis
allows the user to find out more about this module by specifying
module whatis freebayes
on the command line. In order to provide consistent information for each module, we recommend to structure the description as given in the above example, with name, description, URL, and the name of the person that installed the module (you). This will allow users of the module to contact you when questions concerning the module arise.

The line

set               root                 /projects/cees/bin/freebayes/0.9.14/

defines a variable 'root' as the directory name of the actual software installation in /projects/454data/bin/freebayes/0.9.14/, and with the following command

prepend-path      PATH                 $root

the directory name stored in this variable is added to the PATH variable, which is simply a list of directories, in which the system searches for executables whenever a program is called. Thus, this module file allows that after using

module load freebayes/0.9.14

you can simply type

freebayes

to start FreeBayes.


Note: see these instructions on how to make sure the module system can find our locally installed modules.

Program version defaults

When multiple versions of the same program are installed, a default version can be specified. For example, there are currently two versions of FreeBayes installed on /projects/cees/bin/freebayes/ (versions 9.9.2 and 0.9.14), and correspondingly, the modules directory /projects/cees/bin/modules/freebayes contains two module files with these version names. In this case, you may want to specify which version should be loaded by default with the following command.

module load freebayes

This can be achieved with a file named '.version' in the same directory as the module files (here: /projects/cees/bin/freebayes/.version). Note that the file name is preceded by a period symbol, which defines this file to be a hidden file that is not shown when directory contents are normally listed with command

ls

To see hidden files in the current directory, you will have to instead use

ls -a

The content of the '.version' file is very simple:

#%Module
set ModulesVersion "0.9.14"

where, again, the first line allows interpretation of this file as a module file, and the second line specifies the version number that is to be used as the default program version.

Note: While using the default version may be a handy short-cut at times, it is recommended to explicitly specify program version numbers in all scripts. As the default may change over time, scripts using the default might produce different results if you have to rerun them at later stages.