Difference between revisions of "5. Retrieve larger data sets from MARS"

From mn/geo/geoit
Jump to: navigation, search
m (CREATING A SCRIPT)
m
 
(16 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
Back to ECMWF overview[http://muspelheim.nilu.no/w/index.php/ECMWF]
 
Back to ECMWF overview[http://muspelheim.nilu.no/w/index.php/ECMWF]
 +
 +
 +
''Updated by nik: 12 March 2013''
 +
 +
 +
To retrieve large numbers of files from MARS you need to log in to the ECgate server and issue jobs in form of scripts.
  
  
Line 9: Line 15:
 
When running a program on the UNIX system use the ''batch'' system (not interactive mode). That means you submit the job with explicit commands so that the job is run unattended under Unix.  
 
When running a program on the UNIX system use the ''batch'' system (not interactive mode). That means you submit the job with explicit commands so that the job is run unattended under Unix.  
  
''nohup'' is similarly submission of jobs to run unattended on a Unix system, but there exists more sophisticated batch systems for handling jobs.
+
''nohup'' is a way to submit a job to run unattended on a Unix system, but there exists more sophisticated batch systems for handling jobs.
  
 
The batch system currently available on ECgate and HPCF is called '''LoadLeveler''' and jobs are submitted with the command ''llsubmit''.
 
The batch system currently available on ECgate and HPCF is called '''LoadLeveler''' and jobs are submitted with the command ''llsubmit''.
  
  
OBS OBS! From June 2013 the batch system on ECgate will change from Loadleveler to '''SLURM''' and jobs are submitted with the command ''sbatch''.
+
'''OBS OBS! From June 2013 the batch system on ECgate will change from Loadleveler to ''SLURM'' and jobs are submitted with the command ''sbatch''.'''
  
This page should be updated to contain the commands for the new batch system when it is up running.
+
'''This page should be updated to contain the commands for the new batch system when it is up running.'''
  
 
== CREATING A SCRIPT ==
 
== CREATING A SCRIPT ==
Line 23: Line 29:
 
Log in to ECgate.
 
Log in to ECgate.
  
To retrieve large numbers of files from MARS create a '''ksh script''' on your home directory on ecgate.
+
Create a '''ksh script''' on your home directory on ecgate.
  
 
(The default shell is either Korn (ksh) or C-shell (csh)).
 
(The default shell is either Korn (ksh) or C-shell (csh)).
  
 
For a script example see the script from Sam-Erik:
 
/nilu/home/sec/ecmwf/ecmwf_starg_all.ksh
 
  
 
In the beginning of the script, set the Batch System keywords:
 
In the beginning of the script, set the Batch System keywords:
Line 35: Line 38:
 
  #@ job_type    = serial                        (Indicates that this is a serial job)
 
  #@ job_type    = serial                        (Indicates that this is a serial job)
 
  #@ job_name    = request_to_MARS                (Name of job)
 
  #@ job_name    = request_to_MARS                (Name of job)
  #@ initialdir  = /scratch/ms/no/sb9/test/      (Pathname to the initial working directory. OBS: Do not user environmental variables like $USER in these keywords!)
+
  #@ initialdir  = /scratch/ms/no/sb9/test/      (Pathname to the initial working directory. OBS: Do not user environmental variables like $USER or $SCRATCH in these keywords!)
 
  #@ output      = $(job_name).$(jobid).out      (*.out file)
 
  #@ output      = $(job_name).$(jobid).out      (*.out file)
 
  #@ error        = $(job_name).$(jobid).err      (Error file)
 
  #@ error        = $(job_name).$(jobid).err      (Error file)
Line 48: Line 51:
 
Then add your request information which might look like this:
 
Then add your request information which might look like this:
 
  retrieve,
 
  retrieve,
   class    = od,      ("Operational archive"
+
   class    = od,      ("Operational archive")
   stream  = oper,    ("operational Atmospheric model")
+
   stream  = oper,    ("operational Atmospheric model", for analysis data this would be "an")
 
   expver  = 1,      ("Experiment version", always use 1)
 
   expver  = 1,      ("Experiment version", always use 1)
 
   date    = 1,      ("Specifies the Analysis/Forecast base date",  n is the number of days before today)
 
   date    = 1,      ("Specifies the Analysis/Forecast base date",  n is the number of days before today)
Line 75: Line 78:
 
  -verbose
 
  -verbose
  
 +
== RETRIEVE, LIST, READ, COMPUTE and WRITE CALLS ==
 +
 +
 +
Instead of the ''retrieve'' keyword in the above script you can use other keywords to ''list'', ''read'', ''compute'' or ''write'' data.
 +
 +
'''As a rule - you should have as few retrieve calls as possible, but rather have each retrieval call collect as much data as possible. '''
 +
 +
This is to limit the number of calls to the archive considering the architectural structure of the archive. Some data are stored on tape, and this is collected "manually".
 +
 +
Accessing the same tape multiple times in form of many retrieval commands is a large effort.
  
Instead of the "retrieve" keyword in the above script you can use other keywords to "read", "compute" or "write" data.
+
You can collect a lot of data in one single retrieve call and subsequently change the output file and create multiple files of the data with the ''read''/''compute''/''write'' commands.
As a rule - you should have as few retrieval routines as possible, but rather have each retrieval routine
+
 
 +
However, to retrieve e.g. ERA-INTERIM data for a full year, it would be most appropriate to split the retrieval calls up in 12 (one for each month) considering
 +
- the architecture of MARS
 +
- volume of data extracted
 +
- restrictions on disk space
 +
- restart availability
 +
- queuing (the larger the request the slower)
 +
 
 +
 
 +
By optimizing your request you can jump forward in the queue!
  
 
== SUBMIT YOUR JOB ==
 
== SUBMIT YOUR JOB ==
Line 96: Line 118:
  
 
  llcancel <jobId>To cancel your script
 
  llcancel <jobId>To cancel your script
 
  
  
 
See ''man llq'' for more options
 
See ''man llq'' for more options

Latest revision as of 16:10, 12 March 2013

Back to ECMWF overview[1]


Updated by nik: 12 March 2013


To retrieve large numbers of files from MARS you need to log in to the ECgate server and issue jobs in form of scripts.


BATCH JOBS

The retrieval of data is done through a submission of a shell-script.

When running a program on the UNIX system use the batch system (not interactive mode). That means you submit the job with explicit commands so that the job is run unattended under Unix.

nohup is a way to submit a job to run unattended on a Unix system, but there exists more sophisticated batch systems for handling jobs.

The batch system currently available on ECgate and HPCF is called LoadLeveler and jobs are submitted with the command llsubmit.


OBS OBS! From June 2013 the batch system on ECgate will change from Loadleveler to SLURM and jobs are submitted with the command sbatch.

This page should be updated to contain the commands for the new batch system when it is up running.

CREATING A SCRIPT

Log in to ECgate.

Create a ksh script on your home directory on ecgate.

(The default shell is either Korn (ksh) or C-shell (csh)).


In the beginning of the script, set the Batch System keywords:

#@ shell        = /usr/bin/ksh                   (Specify the shell)
#@ job_type     = serial                         (Indicates that this is a serial job)
#@ job_name     = request_to_MARS                (Name of job)
#@ initialdir   = /scratch/ms/no/sb9/test/       (Pathname to the initial working directory. OBS: Do not user environmental variables like $USER or $SCRATCH in these keywords!)
#@ output       = $(job_name).$(jobid).out       (*.out file)
#@ error        = $(job_name).$(jobid).err       (Error file)
#@ class        = normal                         (indicates the priority of the job, usually this is "normal")
#@ notification = complete                       (Sends notification on completion)
#@ notify_user = <userId>@ecmwf.int              (change to your userID, by default your userID )
#@ account_no   = spnoflex                       (FLEXPART account)
#@ queue                                         (indicates the end of the keywords, mandatory)


Then add your request information which might look like this:

retrieve,
 class    = od,      ("Operational archive")
 stream   = oper,    ("operational Atmospheric model", for analysis data this would be "an")
 expver   = 1,       ("Experiment version", always use 1)
 date     = 1,       ("Specifies the Analysis/Forecast base date",  n is the number of days before today)
 time     = 00:00,   ("Specifies the Analysis/Forecast base time"
 step     = 0/to/72/by/6,
 type     = cf,
 levtype  = pl,
 levelist = 100/150/200/250/300/400/500/700/850/925/1000,
 param    = 129.128/130.128/131.128/132.128/133.128,
 grid     = 0.5/0.5,
 area     = 65/0/55/20,
 target   = "ecmwf_data.grib"  (Output file containing the data)

A summary of MARS keywords can be found here:

http://www.ecmwf.int/publications/manuals/mars/guide/MARS_keywords.html


To transfer files to zardoz add the following to your ksh script:

ectrans -remote my_ms_association@genericFtp \
-source ecmwf_data.grib \
-mailto user@nilu.no \
-onfailure \
-remove \
-verbose

RETRIEVE, LIST, READ, COMPUTE and WRITE CALLS

Instead of the retrieve keyword in the above script you can use other keywords to list, read, compute or write data.

As a rule - you should have as few retrieve calls as possible, but rather have each retrieval call collect as much data as possible.

This is to limit the number of calls to the archive considering the architectural structure of the archive. Some data are stored on tape, and this is collected "manually".

Accessing the same tape multiple times in form of many retrieval commands is a large effort.

You can collect a lot of data in one single retrieve call and subsequently change the output file and create multiple files of the data with the read/compute/write commands.

However, to retrieve e.g. ERA-INTERIM data for a full year, it would be most appropriate to split the retrieval calls up in 12 (one for each month) considering

- the architecture of MARS
- volume of data extracted
- restrictions on disk space
- restart availability
- queuing (the larger the request the slower)


By optimizing your request you can jump forward in the queue!

SUBMIT YOUR JOB

Submit your job as a batch job to LoadLeveler.

To submit your script:

llsubmit myscript


MONITOR YOUR JOB

llq -u <UserId> To view where your job is in the queue
llq -l <jobId>  To get a detailed description for a job
llq -s <jobId>  To determine why the job is not running
llcancel <jobId>To cancel your script


See man llq for more options