MPAS | WRF | Extract subset of GFS GRIB files

GFS subset download

NOMADS Data at NCEP

  • https://nomads.ncep.noaa.gov/
  • https://nomads.ncep.noaa.gov/gribfilter.php?ds=gfs_0p25_1hr
  • Example:
    1
    https://nomads.ncep.noaa.gov/cgi-bin/filter_gfs_0p25_1hr.pl?dir=%2Fgfs.20250228%2F00%2Fatmos&file=gfs.t00z.pgrb2.0p25.anl&var_TMP=on&var_U-GWD=on&var_V-GWD=on&all_lev=on
  • Dowload script:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    #!/bin/bash
    #
    # define URL
    #
    workdir=/home/wpsze/mpas/GFS/
    hr=00
    #date_st=20240920
    date_st=$(date -d "today -1 days" +%Y%m%d)

    #wget

    for iday in {0..0..1}; do

    idate=$(date -d "$date_st + $iday days" +%Y%m%d)
    echo $date_st $idate
    for fhr in {000..120..1}; do
    #for fhr in {076..76..1}; do


    URL="https://nomads.ncep.noaa.gov/cgi-bin/filter_gfs_0p25_1hr.pl?\
    dir=%2Fgfs.${idate}%2F${hr}%2Fatmos&\
    file=gfs.t${hr}z.pgrb2.0p25.f${fhr}&\
    var_HGT=on&var_PRATE=on&var_PRES=on&var_PRMSL=on&var_RH=on&var_TMP=on&var_UGRD=on&var_VGRD=on&\
    lev_2_m_above_ground=on&lev_10_m_above_ground=on&lev_80_m_above_ground=on&lev_100_m_above_ground=on&\
    lev_1000_m_above_ground=on&lev_4000_m_above_ground=on&lev_1000_mb=on&lev_975_mb=on&lev_950_mb=on&\
    lev_925_mb=on&lev_900_mb=on&lev_850_mb=on&lev_700_mb=on&lev_600_mb=on&lev_500_mb=on&lev_surface=on&lev_mean_sea_level=on&\
    subregion=&toplat=65&leftlon=70&rightlon=140&bottomlat=10"

    yyyymm=${idate:0:6}
    datapath=$workdir/0p25/$yyyymm/$idate/
    mkdir -p $datapath

    # download file
    #curl "$URL" -o download_test$fhr.grb
    wget "$URL" -O $datapath/gfs.t${hr}z.pgrb2.0p25.f${fhr}.grb
    # add a sleep to prevent a denial of service in case of missing file
    sleep 3
    done
    done
NCEP GFS Forecasts (0.25 degree grid)
NCEP GFS Forecasts (0.25 degree grid)

rda.ucar.edu

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#!/usr/bin/env csh
#
# c-shell script to download selected files from rda.ucar.edu using Wget
# NOTE: if you want to run under a different shell, make sure you change
# the 'set' commands according to your shell's syntax
# after you save the file, don't forget to make it executable
# i.e. - "chmod 755 <name_of_script>"
#
# Experienced Wget Users: add additional command-line flags to 'opts' here
# Use the -r (--recursive) option with care
# Do NOT use the -b (--background) option - simultaneous file downloads
# can cause your data access to be blocked
set opts = "-N"
#
# Check wget version. Set the --no-check-certificate option
# if wget version is 1.10 or higher
set v = `wget -V |grep 'GNU Wget ' | cut -d ' ' -f 3`
set a = `echo $v | cut -d '.' -f 1`
set b = `echo $v | cut -d '.' -f 2`
if(100 * $a + $b > 109) then
set cert_opt = "--no-check-certificate"
else
set cert_opt = ""
endif

set filelist= ( \
https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2024123000.f384-25.2025011200.f195.grib2.tar \
https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025011200.f198-25.2025011618.f135.grib2.tar \
https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025011618.f138-25.2025012100.f360.grib2.tar \
https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025012100.f366-25.2025012512.f129.grib2.tar \
https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025012512.f132-25.2025012918.f330.grib2.tar \
https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025012918.f336-25.2025020306.f105.grib2.tar \
https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025020306.f108-25.2025020712.f228.grib2.tar \
https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025020712.f231-25.2025021200.f003.grib2.tar \
https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025021200.f006-25.2025021606.f084.grib2.tar \
https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025021606.f087-25.2025021712.f384.grib2.tar \
)
while($#filelist > 0)
set syscmd = "wget $cert_opt $opts $filelist[1]"
echo "$syscmd ..."
$syscmd
shift filelist
end
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/bin/env python
"""
Python script to download selected files from rda.ucar.edu.
After you save the file, don't forget to make it executable
i.e. - "chmod 755 <name_of_script>"
"""
import sys, os
from urllib.request import build_opener

opener = build_opener()

filelist = [
'https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2024123000.f384-25.2025011200.f195.grib2.tar',
'https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025011200.f198-25.2025011618.f135.grib2.tar',
'https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025011618.f138-25.2025012100.f360.grib2.tar',
'https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025012100.f366-25.2025012512.f129.grib2.tar',
'https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025012512.f132-25.2025012918.f330.grib2.tar',
'https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025012918.f336-25.2025020306.f105.grib2.tar',
'https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025020306.f108-25.2025020712.f228.grib2.tar',
'https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025020712.f231-25.2025021200.f003.grib2.tar',
'https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025021200.f006-25.2025021606.f084.grib2.tar',
'https://request.rda.ucar.edu/dsrqst/SZE782753/TarFiles/gfs.0p25.2025021606.f087-25.2025021712.f384.grib2.tar'
]

for file in filelist:
ofile = os.path.basename(file)
sys.stdout.write("downloading " + ofile + " ... ")
sys.stdout.flush()
infile = opener.open(file)
outfile = open(ofile, "wb")
outfile.write(infile.read())
outfile.close()
sys.stdout.write("done\n")

Micromamba env setup

1
2
3
4
5
6
7
8
9
10
11
12
 $ cat list.yml 
name: PGW
channels:
- anaconda
- conda-forge
dependencies:
- conda-forge::cdo
- conda-forge::dask
- matplotlib
- python=3.11
- scipy
- conda-forge::wgrib

grib checking

take GFS-wave model as example,

  • grib_ls gfs.t00z.global.0p16.f000.grib2
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    gfs.t00z.global.0p16.f000.grib2
    edition centre date dataType gridType stepRange typeOfLevel level shortName packingType
    2 kwbc 20250227 fc regular_ll 0 surface 1 ws grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 surface 1 wdir grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 surface 1 u grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 surface 1 v grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 surface 1 swh grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 surface 1 perpw grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 surface 1 dirpw grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 surface 1 shww grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 orderedSequenceData 1 swell grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 orderedSequenceData 2 swell grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 orderedSequenceData 3 swell grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 surface 1 mpww grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 orderedSequenceData 1 swper grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 orderedSequenceData 2 swper grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 orderedSequenceData 3 swper grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 surface 1 wvdir grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 orderedSequenceData 1 swdir grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 orderedSequenceData 2 swdir grid_jpeg
    2 kwbc 20250227 fc regular_ll 0 orderedSequenceData 3 swdir grid_jpeg
    19 of 19 messages in gfs.t00z.global.0p16.f000.grib2

    19 of 19 total messages in 1 files
  • re-download dataset that only contains surface variables.
  • grib_to_netcdf gfs.t00z.global.0p16.f000.grib2 -o test.nc
  • ncvis/ncview test.nc

Read GFS grib2 files

To process a CSV file containing station information (latitude and longitude) and extract weather data from GFS GRIB files, you can follow the steps below. This example assumes you have a CSV file named stations.csv with columns for station names, latitude, and longitude.

1
2
3
4
5
 $ grib_ls ./subset-GFS/gfs.0p25.2025021200.f006.grib2
./subset-GFS/gfs.0p25.2025021200.f006.grib2
edition centre date dataType gridType typeOfLevel level stepRange shortName packingType
2 kwbc 20250212 fc regular_ll surface 0 0-6 dswrf grid_complex_spatial_differencing
1 of 1 messages in ./subset-GFS/gfs.0p25.2025021200.f006.grib2

Step 1: Prepare Your CSV File

Ensure your stations.csv has the following structure:

1
2
3
station,latitude,longitude
Station1,22.3,114.2
Station2,22.6,114.6

Step 2: Code to Extract Data and Save to CSV

Here's a complete Python script that reads the station information, extracts the required data from the GRIB files, and writes the results to a new CSV file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#!/bin/python
import xarray as xr
import pandas as pd

# Step 1: Load the station data
stations = pd.read_csv('station.csv')

# Step 2: Initialize an empty list to store results
results = []

# Step 3: Load GFS GRIB files
# Replace 'path_to_grib_files/*.grib' with your actual path to the GRIB files
grib_files = './subset-GFS/gfs.0p25.2025021200.f006.grib2'

# Step 4: Open the GRIB files using xarray
# ds = xr.open_mfdataset(grib_files, engine='cfgrib', combine='by_coords')
ds = xr.open_dataset(grib_files, engine='cfgrib') #,backend_kwargs={'filter_by_keys': {'typeOfLevel': 'isobaricInhPa'}})

# Print the names of the variables
print(ds)

# Step 5: Extract data for each station
for index, row in stations.iterrows():
lat = row['latitude']
lon = row['longitude']

# Step 6: Interpolate the data using bilinear interpolation
# t2m = ds['t2m'].interp(latitude=lat, longitude=lon, method='linear').values
# u10 = ds['u10'].interp(latitude=lat, longitude=lon, method='linear').values
# v10 = ds['v10'].interp(latitude=lat, longitude=lon, method='linear').values
# rh2m = ds['rh2m'].interp(latitude=lat, longitude=lon, method='linear').values
sdswrf = ds['sdswrf'].interp(latitude=lat, longitude=lon, method='linear').values

# Step 7: Append the results
results.append({
'station': row['station'],
'sdswrf': sdswrf
# 't2m': t2m,
# 'u10': u10,
# 'v10': v10,
# 'rh2m': rh2m
})

# Step 8: Create a DataFrame from the results
output_df = pd.DataFrame(results)

# Step 9: Save the results to a CSV file
output_df.to_csv('output.csv', index=False)

Step 3: Run the Script

  1. Make sure to replace 'path_to_your_file.grib2' with the actual path to your GRIB file.
  2. Save the script to a .py file and run it using Python.

Output

The output will be a CSV file named output.csv with the following columns:

  • station
  • t2m (Temperature at 2m)
  • u10 (U-component of wind at 10m)
  • v10 (V-component of wind at 10m)
  • rh2m (Relative humidity at 2m)
  • sdswrf (The surface downward shortwave radiation flux)
1
2
3
4
 $ cat output.csv 
station,sdswrf
Station1,52.232960662841684
Station2,47.779839172363275

Note

  • Ensure that the indices for t2m, u10, v10, rh2m, and sdswrf are correctly set according to how your GRIB data is structured.
  • You may need to adjust the extraction logic based on the specific GRIB messages you are working with.

Read Multiple GFS grib files

Code to Extract Data and Save to CSV

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
#!/bin/python

#------------------------------------------------#
#Author: wpsze
#Email: wpsze
#date: 2025-02-19 14:29:21
#Version: 0.0
#Description: The purpose of the script
#Copyright (C): 2025 All rights reserved
#------------------------------------------------#

import xarray as xr
import pandas as pd
import glob

# Step 1: Load the station data
stations = pd.read_csv('station.csv')

# Step 2: Get and sort the list of GFS GRIB files
grib_files = sorted(glob.glob('./subset-GFS/gfs.0p25.2025021200.f*.grib2') ) # Adjust the path and pattern

# Step 3: Interpolate data for each station across all GRIB files
results = []

# GFS grib file
# Coordinates:
# * time (time) datetime64[ns] 824B 2025-02-12 2025-02-12 ... 2025-02-12
# step (time) timedelta64[ns] 824B 0 days 06:00:00 ... 16 days 00:00:00
# surface float64 8B 0.0
# * latitude (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
# * longitude (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
# valid_time (time) datetime64[ns] 824B 2025-02-12T06:00:00 ... 2025-02-28


for grib_file in grib_files:
print(f"Processing: {grib_file}")

# Step 4: Open the GRIB files using xarray
# ds = xr.open_mfdataset(grib_files, engine='cfgrib', combine='by_coords') #combine='nested', concat_dim="time") # combine='by_coords')
ds = xr.open_dataset(grib_file, engine='cfgrib', decode_timedelta=True) #,backend_kwargs={'filter_by_keys': {'typeOfLevel': 'isobaricInhPa'}})

## Print the names of the variables
# print(ds)
# selected_time = ds.time.values # It is initial time!!
selected_time = ds.valid_time.values # It is valid time
## Assuming selected_time_utc is in numpy datetime64 format
selected_time_utc = pd.to_datetime(selected_time) # Convert to pandas datetime
selected_time_utc_plus_8 = selected_time_utc + pd.Timedelta(hours=8) # Add 8 hours

print(selected_time)

# Step 5: Extract data for each station
for index, row in stations.iterrows():
lat = row['latitude']
lon = row['longitude']

# Step 6: Interpolate the data using bilinear interpolation
# t2m = ds['t2m'].interp(latitude=lat, longitude=lon, method='linear').values
# u10 = ds['u10'].interp(latitude=lat, longitude=lon, method='linear').values
# v10 = ds['v10'].interp(latitude=lat, longitude=lon, method='linear').values
# rh2m = ds['rh2m'].interp(latitude=lat, longitude=lon, method='linear').values
sdswrf = ds['sdswrf'].interp(latitude=lat, longitude=lon, method='linear').values

# Step 7: Append the results
results.append({
'station': row['station'],
'datetime': selected_time,
'sdswrf': sdswrf
# 't2m': t2m,
# 'u10': u10,
# 'v10': v10,
# 'rh2m': rh2m
})

# Step 8: Create a DataFrame from the results
output_df = pd.DataFrame(results)

# Step 9: Save the results to a CSV file
output_df.to_csv('output_multiple.csv', index=False)

Output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
station,datetime[UTC],datetime[BJT],sdswrf
Station1,2025-02-12 06:00:00,2025-02-12 14:00:00,52.232960662841684
Station2,2025-02-12 06:00:00,2025-02-12 14:00:00,47.779839172363275
Station1,2025-02-12 09:00:00,2025-02-12 17:00:00,62.86639785766609
Station2,2025-02-12 09:00:00,2025-02-12 17:00:00,40.13679992675776
Station1,2025-02-12 12:00:00,2025-02-12 20:00:00,33.28959899902347
Station2,2025-02-12 12:00:00,2025-02-12 20:00:00,22.780800170898374
Station1,2025-02-12 15:00:00,2025-02-12 23:00:00,0.0
Station2,2025-02-12 15:00:00,2025-02-12 23:00:00,0.0
Station1,2025-02-12 18:00:00,2025-02-13 02:00:00,0.0
Station2,2025-02-12 18:00:00,2025-02-13 02:00:00,0.0
Station1,2025-02-12 21:00:00,2025-02-13 05:00:00,0.0
Station2,2025-02-12 21:00:00,2025-02-13 05:00:00,0.0
Station1,2025-02-13 00:00:00,2025-02-13 08:00:00,2.2220799827575504
Station2,2025-02-13 00:00:00,2025-02-13 08:00:00,3.2377599430084407
Station1,2025-02-13 03:00:00,2025-02-13 11:00:00,146.60479980468722
Station2,2025-02-13 03:00:00,2025-02-13 11:00:00,215.17280273437623
Station1,2025-02-13 06:00:00,2025-02-13 14:00:00,154.97856079101527
Station2,2025-02-13 06:00:00,2025-02-13 14:00:00,259.47327941894736
Station1,2025-02-13 09:00:00,2025-02-13 17:00:00,137.50800354003917
......

xarray's interp method with the "linear" option on a 2D array

When using xarray's interp method with the "linear" option on a 2D array, the method performs linear interpolation along the specified dimensions. This allows you to compute interpolated values at new coordinate points based on existing data in two dimensions.

How It Works for 2D Arrays

  1. Data Structure: A 2D array in xarray is typically represented as a DataArray or a Dataset, where you have two dimensions (e.g., latitude and longitude, or time and another variable).
  2. Linear Interpolation: The "linear" method calculates the interpolated values by creating linear relationships between existing data points. For each new coordinate point specified, it finds the nearest surrounding points in both dimensions and computes the interpolated value based on the linear ratio of the distances.
  • For each new coordinate:
    • The method identifies the surrounding points in the original 2D array.
    • It calculates the interpolated value based on the linear relationship between the values at these surrounding points.
    • The interpolation is done separately for each dimension, effectively creating a bilinear interpolation effect.
  • 2D Linear Interpolation: The "linear" method allows you to compute intermediate values smoothly between known data points in a 2D grid.
  • Applications: This is useful in various fields such as geospatial analysis, climate modeling, and any scenario where a grid of values needs to be estimated at non-integer coordinates.

Using xarray's interp method in this way enables effective handling of multi-dimensional datasets, ensuring accurate and smooth transitions between data points.


MPAS | WRF | Extract subset of GFS GRIB files
https://waipangsze.github.io/2025/02/19/MPAS-WRF-Extract-GFS-GRIB-files/
Author
wpsze
Posted on
February 19, 2025
Updated on
February 28, 2025
Licensed under