Submitting to ENA

From mn/bio/cees-bioinf
Revision as of 11:25, 25 January 2016 by Monica@uio.no (talk | contribs)

Jump to: navigation, search

When you have got your sequences all analyzed and the paper ready, you will most likely be asked to submit your sequences somewhere. We recommend ENA, the European Nucleotide Archive. Here you will find some advice on how to do that in a not so painful way. Thanks to Mari Espelund for being a test case and for reporting on her experience.


How to proceed:

  • You need a submission account. You get this by applying to ENA for one. The webpage for doing that is this one. After getting your account, you can then proceed with the sumission itself.


  • Next, gather all of the sequence files that you will be submitting into one directory. You will need to figure out the md5sum of your sequences. This is actually just a number that is calculated on the basis of your file. If you change a comma, a whitespace or add an extra line, the number changes. It is used by ENA to ensure that when you transfer your files to them, they are getting all of the file.

    To calculate your md5sum, you either need to download a program for windows or mac that does this. An alternative is to log into a linux/unix machine (for instance titan) and calculate it there. The way to do it on these machines is simply:

    md5sum nameOfYourSequenceFile

    The results should look like something like this: d0e02ef5cac9e813839318061ee4edfb

  • ENA has two portals - a test and a production portal. Start out by using the test portal. Everything you do there, excluding the uploading of the files, will be deleted after 25hrs. Using this first is highly recommended so that you manage to get together all of the details they want, without running the risk of submitting something wrong.

    Please note that the window that you work in while sumitting is very wide, and in some cases fields that you are supposed to fill out may end up on the outside of your browser window. Scroll sideways to ensure that you are seeing everything. 

    Also note, if you shift panes during sumission without having filled in all of the details, you might end up with unrecoverable errors, thus having to start all over again.

  • In the first pane you are asked to upload your files. This can be done by clicking on the link where it says SRA-FileUpLoader. This opens a small program on your computer. Here you first log in with your login details, and then you select the directory where your sequence files are stored. Make sure they are all marked for upload and press upload. This will take some time - go get a coffee while waiting.
  • When you have filled out everything, you will get to the last pane where you fill in the file names and the md5sums of the files. Here you will see that you can download a spreadsheet with all of the information that you have filled out so far. NOTE: you can use this spreadsheet to go directly to this step when you are working in the production server. 

  • Please note: When you press sumit in the test server, YOU HAVE NOT REALLY SUBMITTED! This window just mimics that which you will se in the production server. However, the files you uploaded are still uploaded, so no need to do this again.
  • Last but not least, log in to the production server, start a new study and then go upload your spreadsheet and state the names of your files with your md5sums. When you the press submit, you have actually submitted. You should then get an email within a few days stating that they are working on your submission, and then, if everything is ok, you will get your accession numbers reasonably fast.
  • When uploading from Abel or the Cod nodes you can use wput together with ENA username and password. Make sure you are in the correct folder to avoid making a folder structure on the ENA ftp server:
  • /projects/cees/bin/wput/wput-0.6.1/wput -i reads_to_upload ftp://username:password@webin.ebi.ac.uk

    Congratulations on your submittal!