How to run BLAST

From mn/ibv/bioinfwiki
Jump to: navigation, search

How to run BLAST

BLAST is per definition an algorithm (i.e. a specific logical way of solving a problem). But it is also a program, available for download from the NCBI website. In addition, many websites provide user-friendly BLAST interfaces, including ready-to-use databases. Thus there are many ways of running BLAST, each with advantages and disadvantages. If you are affiliated with the University of Oslo, those are the main alternatives:

Running BLAST on the NCBI website

The NCBI BLAST website is very user-friendly, and provides many useful links from the BLAST output list. This makes it very easy to further analyse the BLAST hits. Also, many databases are available. Often, the “nr” or “nr/nt” databases will suffice; these contain basically all known sequences (proteins and nucleotide sequences, respectively). Another useful option is restricting the search to one organism only. Finally, it is possible to fine-tune the BLAST search using many of the build-in BLAST parameters. Note that it is possible to download the BLAST results in a number of different file formats. This may be useful if you want to use some other program to visualize and analyse the output.

A disadvantage is the limited processing time available for each user. Thus it is not possible to run many queries at the same time. Also, it is not possible to use custom databases (these might, for instance, contain data not yet present in the NCBI system).

http://blast.ncbi.nlm.nih.gov/Blast.cgi

Running BLAST on the UoO Lifeportal

If affiliated with UoO, you may use the Abel high-performance computing system to run BLAST searches. The Lifeportal website offers a visual user interface to make this easier. It is not quite as user-friendly as the NCBI website, but should still be easy to use. If you have an account on Abel, you can potentially run very large BLAST searches (these may take a long time to finish, though!). Also, you can upload and use custom databases (in the form of FASTA files).

A disadvantage (though unavoidable) is the queue system on Abel: you BLAST searches will not be executed immediately, but rather wait for when the system has a slot of computing time available. The waiting time depends on the number of other people using Abel at the same time, and also on how long your BLAST search can be expected to be running. The latter constitutes another potential problem: you have to specify how long your BLAST search can be expected to run. Here, you basically have to experiment a bit. Be aware, however, that setting long run times will cause your BLAST search to wait longer in the queue. Finally, unlike the NCBI website you will not get a visualization of the BLAST output. If including many queries in your search, using a BLAST output visualization program (as stated above) is recommended.

https://lifeportal.uio.no/root

Running BLAST manually from the command line

Behind a graphical user interface, all of the above programs use the same underlying BLAST program. Likewise, you can use this program directly, giving you more direct control over the BLAST search. Using the program directly means accessing it from the command-line. This you can do on your own local machine, or (provided you are affiliated with UoO) by logging in to Abel, the UoO high-performance computing cluster.

If using your own custom database, and not too many queries, running BLAST on your local machine may be a good option. You do not have network access, do not have to submit your BLAST search into a queue where it may have wait a long time before execution, and you can use your computers graphical tools to organize BLAST-related files and folders. It is however not possible to run big BLAST searches on a local computer, as they would take a long time to finish. The precise definition of when a BLAST search is “too big” is difficult to give. Often, it would be impractical to use the large “nr” or ”nr/nt” database on a local machine, simply because they require too much disk space (and would take forever to download). Such searches are best done on Abel. On the other hand, if using a custom database (typically much smaller than “nr” or ”nr/nt”), even using hundreds of queries may be OK on a local machine. After all, leaving the computer on overnight is a possibility (and running such a search on Abel will probably not start right away either, thus not really saving any time).

If deciding to use the local machine for BLASTing, the BLAST program must be downloaded and installed. The standard BLAST program (which is also used by the above websites) is available as a downloadable package from:

http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download

This will download a zipped package. After unzipping this, the “bin” subfolder will contain the different BLAST programs (such as blastn.exe for running nucleotide BLAST searches, and the blastp.exe program to run protein BLAST searches). In addition, it contains some other programs that you may need to set up your BLAST search. The most important amongst these is the “makeblastdb.exe” program, which will prepare a FASTA-file database to be used together with BLAST.

On Abel, you do not need to download BLAST, as it is already installed. Instead you need to load the BLAST package by typing “module load BLAST”.

After installation, usage of these programs is about identical on a local machine and on Abel. However, you may run into problems in using BLAST on the local machine if BLAST has not been put on the computer ”PATH” (see here for a description).