Retrieve large data sets from MARS with ecmwfapi

From mn/geo/geoit
Jump to: navigation, search

ecmwfapi is a python interface which allows you to retrieve data from ECMWF directly on your computer (no need to be run at ECMWF). To use it you need:

  • to set-up your environment for python
  • to install your ECMWFAPI key


The easiest to set-up your environment is to "load" flexpart modulefile:

       module load flexpart


Note: if you are willing to run WRF instead, you should load wrf instead of flexpart.

       module load wrf


If you have any problems with your environment, look here.

To install your API key:

To access ECMWF you will need an API key. For that you first need to login at https://apps.ecmwf.int/auth/login/ and then retrieve you key at https://api.ecmwf.int/v1/key/. For this, you will need to have an account on ECMWF web site. If you don't have an account, please self register at https://apps.ecmwf.int/registration/

Copy the information in this page and paste it in the file $HOME/.ecmwfapirc.

Content $HOME/.ecmwfapirc
{
    "url"   : "https://api.ecmwf.int/v1",
    "key"   : "XXXXXXXXXXXXXXXXXXXXXX",
    "email" : "john.smith@example.com"
}


If you need to retrieve data to run your model on abel, you should copy this file on abel too.


 Example

#!/usr/bin/env python
#
# (C) Copyright 2012-2013 ECMWF.
#
# This software is licensed under the terms of the Apache Licence Version 2.0
# which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
# In applying this licence, ECMWF does not waive the privileges and immunities
# granted to it by virtue of its status as an intergovernmental organisation nor
# does it submit to any jurisdiction.
#

from ecmwfapi import ECMWFDataServer

# To run this example, you need an API key
# available from https://api.ecmwf.int/v1/key/

server = ECMWFDataServer()

current_date=20120801

server.retrieve({
          'dataset' : "interim",
          'date'    : "%s"%(current_date),
          'time'     : "00",
          'step'     : "0",
          'stream'   : "oper",
          'levtype'  : "sfc",
          'levelist' : "all",
          'type'     : "an",
          'class'    : "ei",
          'grid'     : "128",
          'param'    : "148",
          'target'   : "ERA_148_%s.grb"%(current_date),
          })


Specifying class is not enough and you also need to set "dataset" (see below).

To run any of this script, give the file a name, and type ./<filename>.py . (First, you may have to type chmod u+x <filename>.py).


Script to specify domain and time period

A comprehensive download file is shown below. To run this, the start and end dates must be specified in the call:

python wrf_era_interim.py --start_year 1992 --end_year 1992                            # one year
or
./wrf_era_interim.py --start_year 1992 --end_year 1992 --start_month 01 --end_month 01 # one month
./wrf_era_interim.py --start_year 1992 --end_year 1992 --start_month 01 --end_month 01 --start_day 01 --end_day 05 
or if you want the data in one file for each day
i in {1..31}; do  ./wrf_era_interim.py --start_year=1992 --end_year=1992 --start_month=01 --end_month=01 --start_day="$i" --end_day="$i"; done


This file creates four files if you need to download forecast fields and analysis fields for surface and pressure layer parameters. To concatenate them into one file, you need to concatenate them using cat file1.grb file2.grb > fileall.mars

cat an_sfc_19920101_19920131.grb an_pl_19920101_19920131.grb fc_sfc_19920101_19920131.grb > ma199201.mars

Then, make sure that all pressure levels are present in the .mars file (otherwise, real.exe will give an error message about num_metgrid_levels):

grib_ls ma199201.mars   # or g1print ma199201.mars

The number of metgrid levels should be 38 (starting at 1,2,3,5,7,10,20... and ending at 950,975,1000 hPa)


#!/usr/bin/env python
from ecmwfapi import ECMWFDataServer
import calendar
import os
import shutil
from optparse import OptionParser
def main():
 usage = "usage: %prog --start_year YYYY --end_year YYYY [--start_month MM] [--end_month MM] [--start_day DD] [--end_day DD]"
 parser = OptionParser(usage=usage)
 parser.add_option("-s", "--start_year", dest="start_year",
           help="start year YYYY", metavar="start_year",type=int )
 parser.add_option("-e", "--end_year", dest="end_year",
           help="end_year YYYY", metavar="end_year", type=int)
 parser.add_option("--start_month", dest="start_month",
           help="start month MM", metavar="start_month", type=int)
 parser.add_option("--end_month", dest="end_month",
           help="end month DD", metavar="end_month", type=int)
 parser.add_option("--start_day", dest="start_day",
           help="start day DD", metavar="start_day", type=int)
 parser.add_option("--end_day", dest="end_day",
           help="end day DD", metavar="end_day", type=int)
 (options, args) = parser.parse_args()
       if not options.start_year:
           parser.error("start year must be specified!")
       else:
           start_year=options.start_year
       if not options.end_year:
           end_year=start_year
       else:
           end_year=options.end_year
       if not options.start_month:
           start_month=1
       else:
           start_month=options.start_month
       if not options.end_month:
           end_month=12
       else:
           end_month=options.end_month
       server = ECMWFDataServer()
       print start_year
       print end_year
       for year in range(start_year, end_year+1):
           print 'YEAR ',year
           for month in range(start_month,end_month+1):
               if not options.start_day:
                       sdate="%s%02d01"%(year,month)
               else:
                       sdate="%s%02d%02d"%(year,month,int(options.start_day))
               if not options.end_day:
                       lastday=calendar.monthrange(year,month)[1]
                       edate="%s%02d%s"%(year,month,lastday)
               else:
                       edate="%s%02d%02d"%(year,month,int(options.end_day))
               print 'get data from ', sdate,' to ',edate,' (YYYYMMDD)'
               server.retrieve({
                               'dataset'  : "interim",
                               'date'     : "%s/to/%s"%(sdate,edate),
                               'time'     : "00/06/12/18",
                               'step'     : "00",
                               'stream'   : "oper",
                               'levtype'  : "sfc",
                               'type'     : "an",
                               'class'    : "ei",
                               'grid'     : "0.5/0.5",
                               'param'    : "165/166/167/168/134/151/235/31/34/33/141/139/170/183/236/39/40/41/42",
                                          #### 'param'    : "167.128/168.128/165.128/166.128/134.128/151.128/39.128/40.128/41.128/42.128/139.128/170.128/183.128/236.128/31.128/172.128/129.128/235.128/141.128/34.128",
                               'area'     : "70./-9./46./37.",
                               'target'   : "an_sfc_%s_%s.grb"%(sdate,edate),
                               })
               server.retrieve({
                               'dataset'  : "interim",
                               'date'     : "%s/to/%s"%(sdate,edate),
                               'time'     : "00/06/12/18",
                               'step'     : "00",
                               'stream'   : "oper",
                               'levtype'  : "pl",
                               'levelist' : "all",
                               'type'     : "an",
                               'class'    : "ei",
                               'grid'     : "0.5/0.5",
                               'param'    : "129/130/131/132/133/157
                               #### 'param'    : "130.128/131.128/132.128/157.128/129.128",
                               'frame'    : "OFF",
                               'area'     : "70./-9./46./37.",
                               'target'   : "an_pl_%s_%s.grb"%(sdate,edate),
                               })
               server.retrieve({
                               'dataset'  : "interim",
                               'date'     : "%s/to/%s"%(sdate,edate),
                               'time'     : "00/12",
                               'step'     : "06/12",
                               'stream'   : "oper",
                               'levtype'  : "sfc",
                               'type'     : "fc",
                               'class'    : "ei",
                               'grid'     : "0.5/0.5",
                               'param'    : "169.128/175.128/228.128",
                               'area'     : "70./-9./46./37.",
                               'target'   : "fc_sfc_%s_%s.grb"%(sdate,edate),
                               })
if __name__ == "__main__":
   main()


 Choosing a dataset

The "dataset" parameter is one of:

Dataset Description Licence
era15 ECMWF Global Reanalysis Data - ERA-15 (Jan 1979 - Dec 1993) general
era20cmv0 ERA-20CM: Ensemble of climate model integrations (Experimental version) general
era40 ECMWF Global Reanalysis Data - ERA-40 (Sep 1957 - Aug 2002) general
eraclim ERA-20CM: Ensemble of climate model integrations general
icoads ICOADS v2.5.1 with interpolated 20CR feedback general
interim ECMWF Global Reanalysis Data - ERA Interim (Jan 1979 - present) general
ispd ISPD v2.2 general
macc MACC macc
macc_ghg_inversions N/A macc_ghg_inversions
macc_nrealtime MACC Near Real-time macc_nrealtime
tigge TIGGE (THORPEX Interactive Grand Global Ensemble) tigge
yotc YOTC (Year of Tropical Convection) general

To access these dataset, you need to agree on the  the corresponding terms and conditions that can be found under the "Licence" link in the table above. See http://apps.ecmwf.int/datasets/ for the content of the datasets. The other parameters are described at: http://www.ecmwf.int/publications/manuals/mars/guide/index.html