Computer directories and the PATH

From mn/ibv/bioinfwiki
Jump to: navigation, search

Computer directories and the PATH

Before trying to run BLAST from the command-line, it is worthwhile understanding how computers keep track of programs such as BLAST. This also reflects how computers organize files inside directories.

The current directory

Imagine you have a text file called “myText.txt” somewhere on your computer. You want to read this file by using the “more” command. You open the command window (or terminal window on a Mac), and type “more myText.txt”. Probably, you will receive a message such as “myText.txt not found” (or something similar to that effect). The reason is simple: your computer does not know where to look for the “myText.txt” file. When you start the command window, you are positioned in a specific directory. You can find out which this directory is by typing “cd” (PC) or “pwd” (Mac/Linux). This directory is called the current directory. You can change it by using the “cd” command. Giving the “more myText.txt” command, your computer will look for the “myText.txt” in the current directory only. Unless “myText.txt” actually is located in your current directory, the file just cannot be found. The solution is simple: tell your computer exactly where the file is located. Assuming the file is found in the following directory “/data/text”, you simply type “more /data/text/myText.txt”. (Remember that on a PC the path separator is a back-slash, not a slash. Also, you need to specify a drive letter, thus your directory may be “c:\data\text”!). Another solution is obviously to set your current directory to the “/data/text” directory using the “cd” command. In this case, the previous “more myText.txt” command will also be able to open the correct file.

The PATH

The use of the “PATH” variable is related to the above example. If typing “blastn” at the command-line, your computer may not understand where to look for the “blastn” program (nor indeed understand that “blastn” refers to the “blastn.exe” program!). You can avoid this problem by providing the full path and file extension. For instance, if BLAST has been installed in the “/programs/blast” directory, you will be able to execute the following command “/programs/blast/bin/blastn.exe” (remember that BLAST typically contains the program files in a “bin” sub-folder). However, if using BLAST a lot, typing this may be considered too much typing. It is  therefore possible to ask the computer to remember that “blastn” really means “the blastn.exe file that is located in the /programs/blast/bin/ directory”. Once you have told the computer this, you can type “blastn” in any current directory; the computer will always look for “blastn.exe” in the “/programs/blast/bin/” directory. This is called “adding some program to the PATH”. Sometimes, programs get added to the PATH automatically during installation. For instance, the current version of the BLAST program (denoted “BLAST+”) adds BLAST to the PATH during installation; the previous BLAST edition (using the “blastall.exe” executable) did not do this. Understanding and modifying the PATH can be somewhat tricky – do it wrong and you risk breaking some programs that used to work fine! An easy solution is simply to always use the full path when specifying programs and files. If this becomes cumbersome, you may use the full path when testing new programs and files, gradually replacing this with shorter filenames. Thus you can detect exactly what eventually goes wrong. Finally, if loading a module on Abel (for instance “module load blast”), the programs included in the loaded module are automatically added to the PATH. Often, you would not even know where these programs are installed, so you must use the program names only.