Solution 4#
About#
So far, we analysed Aerosol Optical Depth from different types of data (satellite, model-based and ground-based observations) for a single dust event. Let us now broaden our view and analyse the annual cycle in 2020 of Aerosol Optical Depth from AERONET and compare it with the CAMS global reanalysis data.
Tasks#
1. Download and plot time-series of AERONET data for Santa Cruz, Tenerife in 2020
Hint
you can select daily aggregates of the station observations by setting the
AVG
key toAVG=20
Interpret the results:
Have there been other times in 2020 with increased AOD values?
If yes, how could you find out if the increase in AOD is caused by dust? Try to find out by visualizing the AOD time-series together with another parameter from the AERONET data.
MSG SEVIRI Dust RGB and MODIS RGB quick looks might be helpful to get a more complete picture of other events that might have happened in 2020
2. Download CAMS global reanalysis (EAC4) and select 2020 time-series for Santa Cruz, Tenerife
Hint
CAMS global forecast - Example notebook (Note: the notebook works with CAMS forecast data, but they have a similar data structure to the CAMS global reanalysis data)
Data access with the following specifications:
Variable on single levels:
Dust aerosol optical depth at 550 nm
Date:Start=2020-01-01
,End=2020-12-31
Time:[00:00, 03:00, 06:00, 09:00, 12:00, 15:00, 18:00, 21:00]
Restricted area:N: 30., W: -20, E: 14, S: 20.
Format:netCDF
With the xarray function
sel()
and keyword argumentmethod='nearest'
you can select data based on coordinate informationWe also recommend you to transform your xarray.DataArray into a pandas.DataFrame with the function
to_dataframe()
3. Visualize both time-series of CAMS reanalysis and AERONET daily aggregates in one plot
Interpret the results: What can you say about the annual cycle in 2020 of AOD in Santa Cruz, Tenerife?
Load required libraries
import xarray as xr
import pandas as pd
import wget
from IPython.display import HTML
import matplotlib.pyplot as plt
import matplotlib.colors
from matplotlib.cm import get_cmap
from matplotlib import animation
from matplotlib.axes import Axes
import cartopy.crs as ccrs
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
import cartopy.feature as cfeature
from cartopy.mpl.geoaxes import GeoAxes
GeoAxes._pcolormesh_patched = Axes.pcolormesh
import warnings
warnings.simplefilter(action = "ignore", category = RuntimeWarning)
Load helper functions
%run ../../functions.ipynb
1. Download and plot time-series of AERONET data for Santa Cruz, Tenerife in 2020#
Define latitude / longtiude values for Santa Cruz, Tenerife. You can see an overview of all available AERONET Site Names here.
lat = 28.473
lon = -16.247
As a first step, let us create a Python dictionary in which we store all the parameters we would like to use for the request as dictionary keys. You can initiate a dictionary with curled brackets {}
. Below, we specify the following parameters:
endpoint
: Endpoint of the AERONET web servicestation
: Name of the AERONET stationyear
: year 1 of interestmonth
: month 1 of interestday
: day 1 of interestyear2
: year 2 of interestmonth2
: month 2 of interestday2
: day 2 of interestAOD15
: data type, other options includeAOD10
,AOD20
, etc.AVG
: data format,AVG=10
- all points,AVG=20
- daily averages
The keywords below are those we will need for requesting daily averaged observations of Aerosol Optical Depth Level 1.5 data for the station Santa Cruz, Tenerife from 1 January to 31 December 2020.
data_dict = {
'endpoint': 'https://aeronet.gsfc.nasa.gov/cgi-bin/print_web_data_v3',
'station':'Santa_Cruz_Tenerife',
'year': 2020,
'month': 1,
'day': 1,
'year2': 2020,
'month2': 12,
'day2': 31,
'AOD15': 1,
'AVG': 20
}
In a next step, we construct the final string for the wget request with the format
function. You construct a string by adding the dictionary keys in curled brackets. At the end of the string, you provide the dictionary key informatoin to the string with the format()
function. A print of the resulting url shows, that the format function replaced the information in the curled brackets with the data in the dictionary.
url = '{endpoint}?site={station}&year={year}&month={month}&day={day}&year2={year2}&month2={month2}&day2={day2}&AOD15={AOD15}&AVG={AVG}'.format(**data_dict)
url
'https://aeronet.gsfc.nasa.gov/cgi-bin/print_web_data_v3?site=Santa_Cruz_Tenerife&year=2020&month=1&day=1&year2=2020&month2=12&day2=31&AOD15=1&AVG=20'
Now we are ready to request the data with the function download()
from the wget Python library. You have to pass to the function the constructed url above together with a file path of where the downloaded that shall be stored. Let us store the data as txt
file in the folder ../data/2_observations/aeronet/
.
wget.download(url, '../../eodata/case_study/aeronet/2020_santa_cruz_tenerife_20.txt')
After we downloaded the station observations as txt
file, we can open it with the pandas function read_table()
. We additonally set specific keyword arguments that allow us to specify the columns and rows of interest:
delimiter
: specify the delimiter in the text file, e.g. commaheader
: specify the index of the row that shall be set as header.index_col
: specify the index of the column that shall be set as index
You see below that the resulting dataframe has 296 rows and 81 columns.
df = pd.read_table('../../eodata/case_study/aeronet/2020_santa_cruz_tenerife_20.txt', delimiter=',', header=[7], index_col=1)
df
AERONET_Site | Time(hh:mm:ss) | Day_of_Year | AOD_1640nm | AOD_1020nm | AOD_870nm | AOD_865nm | AOD_779nm | AOD_675nm | AOD_667nm | ... | N[440-675_Angstrom_Exponent] | N[500-870_Angstrom_Exponent] | N[340-440_Angstrom_Exponent] | N[440-675_Angstrom_Exponent[Polar]] | Data_Quality_Level | AERONET_Instrument_Number | AERONET_Site_Name | Site_Latitude(Degrees) | Site_Longitude(Degrees) | Site_Elevation(m)<br> | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date(dd:mm:yyyy) | |||||||||||||||||||||
01:01:2020 | Santa_Cruz_Tenerife | 12:00:00 | 1.0 | 0.061198 | 0.076232 | 0.080599 | -999.0 | -999.0 | 0.085592 | -999.0 | ... | 87.0 | 87.0 | 87.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
03:01:2020 | Santa_Cruz_Tenerife | 12:00:00 | 3.0 | 0.039688 | 0.052219 | 0.056409 | -999.0 | -999.0 | 0.063661 | -999.0 | ... | 131.0 | 131.0 | 131.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
04:01:2020 | Santa_Cruz_Tenerife | 12:00:00 | 4.0 | 0.046246 | 0.058887 | 0.063752 | -999.0 | -999.0 | 0.073848 | -999.0 | ... | 39.0 | 39.0 | 39.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
05:01:2020 | Santa_Cruz_Tenerife | 12:00:00 | 5.0 | 0.039780 | 0.050607 | 0.055417 | -999.0 | -999.0 | 0.065148 | -999.0 | ... | 59.0 | 59.0 | 58.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
06:01:2020 | Santa_Cruz_Tenerife | 12:00:00 | 6.0 | 0.025086 | 0.033686 | 0.037128 | -999.0 | -999.0 | 0.042951 | -999.0 | ... | 33.0 | 33.0 | 32.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
28:12:2020 | Santa_Cruz_Tenerife | 12:00:00 | 363.0 | 0.281631 | 0.361706 | 0.378314 | -999.0 | -999.0 | 0.398796 | -999.0 | ... | 95.0 | 95.0 | 95.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
29:12:2020 | Santa_Cruz_Tenerife | 12:00:00 | 364.0 | 0.225135 | 0.279970 | 0.290588 | -999.0 | -999.0 | 0.304814 | -999.0 | ... | 35.0 | 35.0 | 35.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
30:12:2020 | Santa_Cruz_Tenerife | 12:00:00 | 365.0 | 0.055233 | 0.075359 | 0.079524 | -999.0 | -999.0 | 0.086272 | -999.0 | ... | 34.0 | 34.0 | 34.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
31:12:2020 | Santa_Cruz_Tenerife | 12:00:00 | 366.0 | 0.020104 | 0.025035 | 0.026090 | -999.0 | -999.0 | 0.028000 | -999.0 | ... | 56.0 | 56.0 | 55.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
NaN | </body></html> | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
296 rows × 81 columns
Now, we can inspect the entries in the loaded data frame a bit more. Above you see that the last entry is a NaN entry, which is best to drop with the function dropna()
.
The next step is then to replace the entries with -999.0 and set them as NaN. We can use the function replace()
to do so.
df = df.dropna()
df = df.replace(-999.0, np.nan)
df
AERONET_Site | Time(hh:mm:ss) | Day_of_Year | AOD_1640nm | AOD_1020nm | AOD_870nm | AOD_865nm | AOD_779nm | AOD_675nm | AOD_667nm | ... | N[440-675_Angstrom_Exponent] | N[500-870_Angstrom_Exponent] | N[340-440_Angstrom_Exponent] | N[440-675_Angstrom_Exponent[Polar]] | Data_Quality_Level | AERONET_Instrument_Number | AERONET_Site_Name | Site_Latitude(Degrees) | Site_Longitude(Degrees) | Site_Elevation(m)<br> | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date(dd:mm:yyyy) | |||||||||||||||||||||
01:01:2020 | Santa_Cruz_Tenerife | 12:00:00 | 1.0 | 0.061198 | 0.076232 | 0.080599 | NaN | NaN | 0.085592 | NaN | ... | 87.0 | 87.0 | 87.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
03:01:2020 | Santa_Cruz_Tenerife | 12:00:00 | 3.0 | 0.039688 | 0.052219 | 0.056409 | NaN | NaN | 0.063661 | NaN | ... | 131.0 | 131.0 | 131.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
04:01:2020 | Santa_Cruz_Tenerife | 12:00:00 | 4.0 | 0.046246 | 0.058887 | 0.063752 | NaN | NaN | 0.073848 | NaN | ... | 39.0 | 39.0 | 39.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
05:01:2020 | Santa_Cruz_Tenerife | 12:00:00 | 5.0 | 0.039780 | 0.050607 | 0.055417 | NaN | NaN | 0.065148 | NaN | ... | 59.0 | 59.0 | 58.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
06:01:2020 | Santa_Cruz_Tenerife | 12:00:00 | 6.0 | 0.025086 | 0.033686 | 0.037128 | NaN | NaN | 0.042951 | NaN | ... | 33.0 | 33.0 | 32.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
27:12:2020 | Santa_Cruz_Tenerife | 12:00:00 | 362.0 | 0.087975 | 0.114314 | 0.119534 | NaN | NaN | 0.125963 | NaN | ... | 80.0 | 80.0 | 80.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
28:12:2020 | Santa_Cruz_Tenerife | 12:00:00 | 363.0 | 0.281631 | 0.361706 | 0.378314 | NaN | NaN | 0.398796 | NaN | ... | 95.0 | 95.0 | 95.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
29:12:2020 | Santa_Cruz_Tenerife | 12:00:00 | 364.0 | 0.225135 | 0.279970 | 0.290588 | NaN | NaN | 0.304814 | NaN | ... | 35.0 | 35.0 | 35.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
30:12:2020 | Santa_Cruz_Tenerife | 12:00:00 | 365.0 | 0.055233 | 0.075359 | 0.079524 | NaN | NaN | 0.086272 | NaN | ... | 34.0 | 34.0 | 34.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
31:12:2020 | Santa_Cruz_Tenerife | 12:00:00 | 366.0 | 0.020104 | 0.025035 | 0.026090 | NaN | NaN | 0.028000 | NaN | ... | 56.0 | 56.0 | 55.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
295 rows × 81 columns
Let us now convert the index entry to a DateTimeIndex
format with the function to_datetime()
. Important here, you have to specify the format of the index string: %d:%m:%Y
.
You see below that we do not have observations for every day. E.g on 2 January 2020, the data frame does not list any entry.
df.index = pd.to_datetime(df.index, format = '%d:%m:%Y')
df
AERONET_Site | Time(hh:mm:ss) | Day_of_Year | AOD_1640nm | AOD_1020nm | AOD_870nm | AOD_865nm | AOD_779nm | AOD_675nm | AOD_667nm | ... | N[440-675_Angstrom_Exponent] | N[500-870_Angstrom_Exponent] | N[340-440_Angstrom_Exponent] | N[440-675_Angstrom_Exponent[Polar]] | Data_Quality_Level | AERONET_Instrument_Number | AERONET_Site_Name | Site_Latitude(Degrees) | Site_Longitude(Degrees) | Site_Elevation(m)<br> | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date(dd:mm:yyyy) | |||||||||||||||||||||
2020-01-01 | Santa_Cruz_Tenerife | 12:00:00 | 1.0 | 0.061198 | 0.076232 | 0.080599 | NaN | NaN | 0.085592 | NaN | ... | 87.0 | 87.0 | 87.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
2020-01-03 | Santa_Cruz_Tenerife | 12:00:00 | 3.0 | 0.039688 | 0.052219 | 0.056409 | NaN | NaN | 0.063661 | NaN | ... | 131.0 | 131.0 | 131.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
2020-01-04 | Santa_Cruz_Tenerife | 12:00:00 | 4.0 | 0.046246 | 0.058887 | 0.063752 | NaN | NaN | 0.073848 | NaN | ... | 39.0 | 39.0 | 39.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
2020-01-05 | Santa_Cruz_Tenerife | 12:00:00 | 5.0 | 0.039780 | 0.050607 | 0.055417 | NaN | NaN | 0.065148 | NaN | ... | 59.0 | 59.0 | 58.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
2020-01-06 | Santa_Cruz_Tenerife | 12:00:00 | 6.0 | 0.025086 | 0.033686 | 0.037128 | NaN | NaN | 0.042951 | NaN | ... | 33.0 | 33.0 | 32.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2020-12-27 | Santa_Cruz_Tenerife | 12:00:00 | 362.0 | 0.087975 | 0.114314 | 0.119534 | NaN | NaN | 0.125963 | NaN | ... | 80.0 | 80.0 | 80.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
2020-12-28 | Santa_Cruz_Tenerife | 12:00:00 | 363.0 | 0.281631 | 0.361706 | 0.378314 | NaN | NaN | 0.398796 | NaN | ... | 95.0 | 95.0 | 95.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
2020-12-29 | Santa_Cruz_Tenerife | 12:00:00 | 364.0 | 0.225135 | 0.279970 | 0.290588 | NaN | NaN | 0.304814 | NaN | ... | 35.0 | 35.0 | 35.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
2020-12-30 | Santa_Cruz_Tenerife | 12:00:00 | 365.0 | 0.055233 | 0.075359 | 0.079524 | NaN | NaN | 0.086272 | NaN | ... | 34.0 | 34.0 | 34.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
2020-12-31 | Santa_Cruz_Tenerife | 12:00:00 | 366.0 | 0.020104 | 0.025035 | 0.026090 | NaN | NaN | 0.028000 | NaN | ... | 56.0 | 56.0 | 55.0 | 0.0 | lev15 | 1090.0 | Santa_Cruz_Tenerife | 28.472528 | -16.247361 | 52.000000<br> |
295 rows × 81 columns
We can now plot the column AOD_500nm
as annual time-series. You see that the station Santa Cruz, Tenerife
was affected by other dust events later in 2020.
# Initiate a figure
fig = plt.figure(figsize=(20,8))
ax = plt.subplot()
# Define the plotting function
ax.plot(df.AOD_500nm, 'o-', color='green', label='AERONET observations')
# Customize the title and axes lables
ax.set_title('\nAerosol Optical Depth at 500 nm - Santa Cruz Tenerife\n', fontsize=20)
ax.set_ylabel('~', fontsize=14)
ax.set_xlabel('\nDay', fontsize=14)
# Customize the fontsize of the axes tickes
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
# Add a gridline to the plot
ax.grid(linestyle='--')
plt.legend(fontsize=16, loc=2)
<matplotlib.legend.Legend at 0x7f6faa4667c0>
The next question is now, how you can find out, if the strong increase of AOD at the end of August 2020 was because of dust? For this to find out, you can use the Angstrom Exponent
, which gives us an indication of the particle size. If the Angstrom Exponent
is below 0.6, then it is an indication that the increase of AOD is caused by coarser dust particles.
Let us visualize the AOD at 500nm for 2020 together with the Angstrom Exponent 440-675nm
.
# Initiate a figure
fig = plt.figure(figsize=(20,8))
ax = plt.subplot()
# Define the plotting function
ax.plot(df.AOD_500nm, 'o-', color='green', label='AERONET observations')
ax.plot(df['440-675_Angstrom_Exponent'], '-', color='lightgrey', label='Angstrom Exponent - 440-675nm')
plt.axhline(y=0.6, color='r', linestyle='dotted', label='Angstrom Exponent <0.6 is dust')
# Customize the title and axes lables
ax.set_title('\nAerosol Optical Depth at 500 nm - Santa Cruz Tenerife\n', fontsize=20)
ax.set_ylabel('~', fontsize=14)
ax.set_xlabel('\nDay', fontsize=14)
# Customize the fontsize of the axes tickes
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
# Add a gridline to the plot
ax.grid(linestyle='--')
plt.legend(fontsize=16, loc=2)
<matplotlib.legend.Legend at 0x7f23a7572700>
Above, you see that the Angstrom Exponent during the high AOD values at the end of August is very low. Hence, we could interpret this event as a strong dust intrusion. But is this really the case? You can also check here the MSG SEVIRI Dust RGB for e.g. 26 August 2020 and here the MODIS RGB to better understand the event and what could have caused the high AOD values.
2. Download CAMS global reanalysis (EAC4) and select 2020 time-series for Santa Cruz, Tenerife#
First, we have to download the CAMS global reanalysis (EAC4)
from the Copernicus Atmosphere Data Store with the following specifications:
Variable on single levels:
Dust aerosol optical depth at 550 nm
Date:
Start=2020-01-01
,End=2020-12-31
Time:
[00:00, 03:00, 06:00, 09:00, 12:00, 15:00, 18:00, 21:00]
Restricted area:
N: 30., W: -20, E: 14, S: 20.
Format:
netCDF
See CDSAPI
request below.
URL = 'https://ads.atmosphere.copernicus.eu/api/v2'
KEY = '######################'
import cdsapi
c = cdsapi.Client(url=URL, key=KEY)
c.retrieve(
'cams-global-reanalysis-eac4',
{
'variable': 'dust_aerosol_optical_depth_550nm',
'date': '2020-01-01/2020-12-31',
'time': [
'00:00', '03:00', '06:00',
'09:00', '12:00', '15:00',
'18:00', '21:00',
],
'area': [
30, -20, 20,
15,
],
'format': 'netcdf',
},
'../../eodata/case_study/cams/2020_dustAOD_cams_eac4.nc'}
The data is in netCDF, so we can open the netCDF file with the xarray function open_dataset()
. We see that the data has three dimensions (latitude
, longitude
, time
) and one data variable:
duaod550
: Dust Aerosol Optical Depth at 550nm
file = xr.open_dataset('../../eodata/case_study/cams/2020_dustAOD_cams_eac4.nc')
file
<xarray.Dataset> Dimensions: (latitude: 14, longitude: 47, time: 2928) Coordinates: * longitude (longitude) float32 -20.0 -19.25 -18.5 -17.75 ... 13.0 13.75 14.5 * latitude (latitude) float32 29.75 29.0 28.25 27.5 ... 21.5 20.75 20.0 * time (time) datetime64[ns] 2020-01-01 ... 2020-12-31T21:00:00 Data variables: duaod550 (time, latitude, longitude) float32 ... Attributes: Conventions: CF-1.6 history: 2021-11-10 14:13:05 GMT by grib_to_netcdf-2.23.0: /opt/ecmw...
- latitude: 14
- longitude: 47
- time: 2928
- longitude(longitude)float32-20.0 -19.25 -18.5 ... 13.75 14.5
- units :
- degrees_east
- long_name :
- longitude
array([-20. , -19.25, -18.5 , -17.75, -17. , -16.25, -15.5 , -14.75, -14. , -13.25, -12.5 , -11.75, -11. , -10.25, -9.5 , -8.75, -8. , -7.25, -6.5 , -5.75, -5. , -4.25, -3.5 , -2.75, -2. , -1.25, -0.5 , 0.25, 1. , 1.75, 2.5 , 3.25, 4. , 4.75, 5.5 , 6.25, 7. , 7.75, 8.5 , 9.25, 10. , 10.75, 11.5 , 12.25, 13. , 13.75, 14.5 ], dtype=float32)
- latitude(latitude)float3229.75 29.0 28.25 ... 20.75 20.0
- units :
- degrees_north
- long_name :
- latitude
array([29.75, 29. , 28.25, 27.5 , 26.75, 26. , 25.25, 24.5 , 23.75, 23. , 22.25, 21.5 , 20.75, 20. ], dtype=float32)
- time(time)datetime64[ns]2020-01-01 ... 2020-12-31T21:00:00
- long_name :
- time
array(['2020-01-01T00:00:00.000000000', '2020-01-01T03:00:00.000000000', '2020-01-01T06:00:00.000000000', ..., '2020-12-31T15:00:00.000000000', '2020-12-31T18:00:00.000000000', '2020-12-31T21:00:00.000000000'], dtype='datetime64[ns]')
- duaod550(time, latitude, longitude)float32...
- units :
- ~
- long_name :
- Dust Aerosol Optical Depth at 550nm
[1926624 values with dtype=float32]
- Conventions :
- CF-1.6
- history :
- 2021-11-10 14:13:05 GMT by grib_to_netcdf-2.23.0: /opt/ecmwf/mars-client/bin/grib_to_netcdf -S param -o /cache/data3/adaptor.mars.internal-1636553579.433729-7035-5-eecc5af7-eee6-4de9-9763-28ef1ca46941.nc /cache/tmp/eecc5af7-eee6-4de9-9763-28ef1ca46941-adaptor.mars.internal-1636553535.9643896-7035-7-tmp.grib
Let us now store the data variable Dust Aerosol Optical Depth (AOD) at 550nm
as xarray.DataArray
with the name duaod_cams
.
duaod_cams = file['duaod550']
duaod_cams
<xarray.DataArray 'duaod550' (time: 2928, latitude: 14, longitude: 47)> [1926624 values with dtype=float32] Coordinates: * longitude (longitude) float32 -20.0 -19.25 -18.5 -17.75 ... 13.0 13.75 14.5 * latitude (latitude) float32 29.75 29.0 28.25 27.5 ... 21.5 20.75 20.0 * time (time) datetime64[ns] 2020-01-01 ... 2020-12-31T21:00:00 Attributes: units: ~ long_name: Dust Aerosol Optical Depth at 550nm
- time: 2928
- latitude: 14
- longitude: 47
- ...
[1926624 values with dtype=float32]
- longitude(longitude)float32-20.0 -19.25 -18.5 ... 13.75 14.5
- units :
- degrees_east
- long_name :
- longitude
array([-20. , -19.25, -18.5 , -17.75, -17. , -16.25, -15.5 , -14.75, -14. , -13.25, -12.5 , -11.75, -11. , -10.25, -9.5 , -8.75, -8. , -7.25, -6.5 , -5.75, -5. , -4.25, -3.5 , -2.75, -2. , -1.25, -0.5 , 0.25, 1. , 1.75, 2.5 , 3.25, 4. , 4.75, 5.5 , 6.25, 7. , 7.75, 8.5 , 9.25, 10. , 10.75, 11.5 , 12.25, 13. , 13.75, 14.5 ], dtype=float32)
- latitude(latitude)float3229.75 29.0 28.25 ... 20.75 20.0
- units :
- degrees_north
- long_name :
- latitude
array([29.75, 29. , 28.25, 27.5 , 26.75, 26. , 25.25, 24.5 , 23.75, 23. , 22.25, 21.5 , 20.75, 20. ], dtype=float32)
- time(time)datetime64[ns]2020-01-01 ... 2020-12-31T21:00:00
- long_name :
- time
array(['2020-01-01T00:00:00.000000000', '2020-01-01T03:00:00.000000000', '2020-01-01T06:00:00.000000000', ..., '2020-12-31T15:00:00.000000000', '2020-12-31T18:00:00.000000000', '2020-12-31T21:00:00.000000000'], dtype='datetime64[ns]')
- units :
- ~
- long_name :
- Dust Aerosol Optical Depth at 550nm
Now, we can select the time-series of the grid point nearest to the station in Santa Cruz, Tenerife. We can use the function sel()
to select data based on the longitude and latitude dimensions. The keyword argument method='nearest'
selects the grid point entry closest to the station coordinates.
cams_ts = duaod_cams.sel(longitude=lon, latitude=lat, method='nearest')
cams_ts
<xarray.DataArray 'duaod550' (time: 2928)> array([0.007244, 0.021985, 0.060917, ..., 0.005728, 0.004677, 0.003822], dtype=float32) Coordinates: longitude float32 -16.25 latitude float32 28.25 * time (time) datetime64[ns] 2020-01-01 ... 2020-12-31T21:00:00 Attributes: units: ~ long_name: Dust Aerosol Optical Depth at 550nm
- time: 2928
- 0.007244 0.02198 0.06092 0.07449 ... 0.005728 0.004677 0.003822
array([0.007244, 0.021985, 0.060917, ..., 0.005728, 0.004677, 0.003822], dtype=float32)
- longitude()float32-16.25
- units :
- degrees_east
- long_name :
- longitude
array(-16.25, dtype=float32)
- latitude()float3228.25
- units :
- degrees_north
- long_name :
- latitude
array(28.25, dtype=float32)
- time(time)datetime64[ns]2020-01-01 ... 2020-12-31T21:00:00
- long_name :
- time
array(['2020-01-01T00:00:00.000000000', '2020-01-01T03:00:00.000000000', '2020-01-01T06:00:00.000000000', ..., '2020-12-31T15:00:00.000000000', '2020-12-31T18:00:00.000000000', '2020-12-31T21:00:00.000000000'], dtype='datetime64[ns]')
- units :
- ~
- long_name :
- Dust Aerosol Optical Depth at 550nm
The next step is now to resample the 3-hourly time entries and aggregate it to daily averages. We can use a combination of the functions resample()
and mean()
to create daily averages.
cams_ts_resample = cams_ts.resample(time='1D').mean()
cams_ts_resample
<xarray.DataArray 'duaod550' (time: 366)> array([6.26771897e-02, 1.21897265e-01, 6.45051599e-02, 6.81166351e-03, 1.11380219e-03, 2.36323476e-03, 1.70692801e-03, 2.12018192e-03, 2.59661674e-03, 1.27915293e-02, 2.31949985e-03, 1.18672848e-03, 6.82000369e-02, 9.66019481e-02, 1.06641322e-01, 5.51756322e-02, 6.65611029e-03, 3.57866287e-03, 5.35264611e-04, 5.40107489e-04, 1.26868486e-04, 2.28971243e-04, 5.88308275e-03, 7.26868212e-03, 6.05811179e-03, 4.26903367e-03, 1.19990855e-02, 9.62656736e-03, 8.38011652e-02, 1.65890560e-01, 1.58602878e-01, 6.69117123e-02, 4.65308130e-03, 7.19046295e-02, 3.36282521e-01, 3.49263191e-01, 8.95184875e-02, 6.27355129e-02, 1.27191618e-01, 1.28401518e-02, 2.42699534e-02, 7.66301751e-02, 1.55321255e-01, 2.17939615e-01, 4.48830277e-01, 3.32869619e-01, 6.46899045e-02, 5.61430752e-02, 2.86746323e-02, 1.25387311e-02, 2.85427272e-03, 4.57480550e-04, 1.68058857e-01, 7.46165454e-01, 5.34614563e-01, 4.31294233e-01, 1.71836376e-01, 5.27052581e-03, 2.51790732e-02, 1.20623484e-01, 1.22927919e-01, 5.87635338e-02, 9.09666717e-03, 1.24505162e-03, 9.78703797e-03, 1.80081427e-02, 3.58837843e-03, 1.31649762e-01, 1.49847046e-01, 5.58611155e-02, 8.08890164e-02, 1.23239055e-01, 1.12538531e-01, 1.09684721e-01, 2.63993740e-02, 5.64485788e-03, 1.56059861e-04, 5.24136424e-03, 1.17954418e-01, 1.49613649e-01, ... 1.51704177e-01, 9.84493941e-02, 4.13101166e-02, 2.73425579e-02, 4.11496758e-02, 9.22799408e-02, 2.60631740e-03, 1.46314502e-04, 1.17123127e-04, 8.80983472e-03, 8.88815969e-02, 9.45503116e-02, 3.46885324e-02, 3.57580930e-02, 4.34055179e-02, 9.80604440e-02, 2.00432733e-01, 8.51429701e-02, 5.67653924e-02, 8.52298737e-03, 4.79409099e-03, 3.50505114e-04, 1.65790319e-04, 5.30421734e-04, 9.14469361e-04, 1.81388855e-03, 5.34346700e-03, 2.23447382e-02, 2.38129646e-02, 5.62014282e-02, 1.12864256e-01, 6.75437450e-02, 1.27838194e-01, 6.38682693e-02, 5.41449189e-02, 5.96240610e-02, 5.70862293e-02, 7.16469884e-02, 3.21166962e-02, 5.39456010e-02, 1.17779389e-01, 7.62315542e-02, 5.20691276e-04, 5.88297844e-05, 1.17167830e-04, 3.99127603e-04, 8.31186771e-05, 3.01882625e-04, 3.35916877e-04, 1.56059861e-04, 1.38117373e-03, 3.31127644e-03, 3.93986702e-05, 7.88062811e-04, 3.60235572e-04, 2.97054648e-04, 2.97039747e-04, 2.58132815e-04, 6.32479787e-04, 6.32479787e-04, 3.45647335e-04, 1.31756067e-04, 4.42564487e-05, 1.07407570e-04, 1.12280250e-04, 1.02549791e-04, 1.26838684e-04, 7.34031200e-05, 1.72642767e-02, 1.09888896e-01, 2.62545496e-01, 3.42311025e-01, 9.04908031e-02, 6.92404509e-02, 2.15455323e-01, 1.82337582e-01, 7.28429556e-02, 1.04822367e-02], dtype=float32) Coordinates: * time (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2020-12-31 longitude float32 -16.25 latitude float32 28.25
- time: 366
- 0.06268 0.1219 0.06451 0.006812 ... 0.2155 0.1823 0.07284 0.01048
array([6.26771897e-02, 1.21897265e-01, 6.45051599e-02, 6.81166351e-03, 1.11380219e-03, 2.36323476e-03, 1.70692801e-03, 2.12018192e-03, 2.59661674e-03, 1.27915293e-02, 2.31949985e-03, 1.18672848e-03, 6.82000369e-02, 9.66019481e-02, 1.06641322e-01, 5.51756322e-02, 6.65611029e-03, 3.57866287e-03, 5.35264611e-04, 5.40107489e-04, 1.26868486e-04, 2.28971243e-04, 5.88308275e-03, 7.26868212e-03, 6.05811179e-03, 4.26903367e-03, 1.19990855e-02, 9.62656736e-03, 8.38011652e-02, 1.65890560e-01, 1.58602878e-01, 6.69117123e-02, 4.65308130e-03, 7.19046295e-02, 3.36282521e-01, 3.49263191e-01, 8.95184875e-02, 6.27355129e-02, 1.27191618e-01, 1.28401518e-02, 2.42699534e-02, 7.66301751e-02, 1.55321255e-01, 2.17939615e-01, 4.48830277e-01, 3.32869619e-01, 6.46899045e-02, 5.61430752e-02, 2.86746323e-02, 1.25387311e-02, 2.85427272e-03, 4.57480550e-04, 1.68058857e-01, 7.46165454e-01, 5.34614563e-01, 4.31294233e-01, 1.71836376e-01, 5.27052581e-03, 2.51790732e-02, 1.20623484e-01, 1.22927919e-01, 5.87635338e-02, 9.09666717e-03, 1.24505162e-03, 9.78703797e-03, 1.80081427e-02, 3.58837843e-03, 1.31649762e-01, 1.49847046e-01, 5.58611155e-02, 8.08890164e-02, 1.23239055e-01, 1.12538531e-01, 1.09684721e-01, 2.63993740e-02, 5.64485788e-03, 1.56059861e-04, 5.24136424e-03, 1.17954418e-01, 1.49613649e-01, ... 1.51704177e-01, 9.84493941e-02, 4.13101166e-02, 2.73425579e-02, 4.11496758e-02, 9.22799408e-02, 2.60631740e-03, 1.46314502e-04, 1.17123127e-04, 8.80983472e-03, 8.88815969e-02, 9.45503116e-02, 3.46885324e-02, 3.57580930e-02, 4.34055179e-02, 9.80604440e-02, 2.00432733e-01, 8.51429701e-02, 5.67653924e-02, 8.52298737e-03, 4.79409099e-03, 3.50505114e-04, 1.65790319e-04, 5.30421734e-04, 9.14469361e-04, 1.81388855e-03, 5.34346700e-03, 2.23447382e-02, 2.38129646e-02, 5.62014282e-02, 1.12864256e-01, 6.75437450e-02, 1.27838194e-01, 6.38682693e-02, 5.41449189e-02, 5.96240610e-02, 5.70862293e-02, 7.16469884e-02, 3.21166962e-02, 5.39456010e-02, 1.17779389e-01, 7.62315542e-02, 5.20691276e-04, 5.88297844e-05, 1.17167830e-04, 3.99127603e-04, 8.31186771e-05, 3.01882625e-04, 3.35916877e-04, 1.56059861e-04, 1.38117373e-03, 3.31127644e-03, 3.93986702e-05, 7.88062811e-04, 3.60235572e-04, 2.97054648e-04, 2.97039747e-04, 2.58132815e-04, 6.32479787e-04, 6.32479787e-04, 3.45647335e-04, 1.31756067e-04, 4.42564487e-05, 1.07407570e-04, 1.12280250e-04, 1.02549791e-04, 1.26838684e-04, 7.34031200e-05, 1.72642767e-02, 1.09888896e-01, 2.62545496e-01, 3.42311025e-01, 9.04908031e-02, 6.92404509e-02, 2.15455323e-01, 1.82337582e-01, 7.28429556e-02, 1.04822367e-02], dtype=float32)
- time(time)datetime64[ns]2020-01-01 ... 2020-12-31
array(['2020-01-01T00:00:00.000000000', '2020-01-02T00:00:00.000000000', '2020-01-03T00:00:00.000000000', ..., '2020-12-29T00:00:00.000000000', '2020-12-30T00:00:00.000000000', '2020-12-31T00:00:00.000000000'], dtype='datetime64[ns]')
- longitude()float32-16.25
- units :
- degrees_east
- long_name :
- longitude
array(-16.25, dtype=float32)
- latitude()float3228.25
- units :
- degrees_north
- long_name :
- latitude
array(28.25, dtype=float32)
A closer look at the time
dimension shows us that we now have an entry for each day in 2020.
cams_ts_resample.time
<xarray.DataArray 'time' (time: 366)> array(['2020-01-01T00:00:00.000000000', '2020-01-02T00:00:00.000000000', '2020-01-03T00:00:00.000000000', ..., '2020-12-29T00:00:00.000000000', '2020-12-30T00:00:00.000000000', '2020-12-31T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2020-12-31 longitude float32 -16.25 latitude float32 28.25
- time: 366
- 2020-01-01 2020-01-02 2020-01-03 ... 2020-12-29 2020-12-30 2020-12-31
array(['2020-01-01T00:00:00.000000000', '2020-01-02T00:00:00.000000000', '2020-01-03T00:00:00.000000000', ..., '2020-12-29T00:00:00.000000000', '2020-12-30T00:00:00.000000000', '2020-12-31T00:00:00.000000000'], dtype='datetime64[ns]')
- time(time)datetime64[ns]2020-01-01 ... 2020-12-31
array(['2020-01-01T00:00:00.000000000', '2020-01-02T00:00:00.000000000', '2020-01-03T00:00:00.000000000', ..., '2020-12-29T00:00:00.000000000', '2020-12-30T00:00:00.000000000', '2020-12-31T00:00:00.000000000'], dtype='datetime64[ns]')
- longitude()float32-16.25
- units :
- degrees_east
- long_name :
- longitude
array(-16.25, dtype=float32)
- latitude()float3228.25
- units :
- degrees_north
- long_name :
- latitude
array(28.25, dtype=float32)
Now, we can convert the xarray.DataArray
to a pandas.DataFrame
, as pandas is more efficient to handle time-series data. The function to_dataframe()
easily converts a data array to a dataframe. The resulting dataframe has 366 rows and 3 columns.
cams_ts_df = cams_ts_resample.to_dataframe()
cams_ts_df
longitude | latitude | duaod550 | |
---|---|---|---|
time | |||
2020-01-01 | -16.25 | 28.25 | 0.062677 |
2020-01-02 | -16.25 | 28.25 | 0.121897 |
2020-01-03 | -16.25 | 28.25 | 0.064505 |
2020-01-04 | -16.25 | 28.25 | 0.006812 |
2020-01-05 | -16.25 | 28.25 | 0.001114 |
... | ... | ... | ... |
2020-12-27 | -16.25 | 28.25 | 0.069240 |
2020-12-28 | -16.25 | 28.25 | 0.215455 |
2020-12-29 | -16.25 | 28.25 | 0.182338 |
2020-12-30 | -16.25 | 28.25 | 0.072843 |
2020-12-31 | -16.25 | 28.25 | 0.010482 |
366 rows × 3 columns
3. Combine both annual time-series and visualize both in one plot#
Let us now use the function join()
and combine the two time-series cams_ts_df
and df['AOD_500nm]
. The resulting dataframe has 366 rows and 4 columns.
df_combined = cams_ts_df.join(df['AOD_500nm'])
df_combined
longitude | latitude | duaod550 | AOD_500nm | |
---|---|---|---|---|
time | ||||
2020-01-01 | -16.25 | 28.25 | 0.062677 | 0.094487 |
2020-01-02 | -16.25 | 28.25 | 0.121897 | NaN |
2020-01-03 | -16.25 | 28.25 | 0.064505 | 0.075765 |
2020-01-04 | -16.25 | 28.25 | 0.006812 | 0.098110 |
2020-01-05 | -16.25 | 28.25 | 0.001114 | 0.085672 |
... | ... | ... | ... | ... |
2020-12-27 | -16.25 | 28.25 | 0.069240 | 0.134366 |
2020-12-28 | -16.25 | 28.25 | 0.215455 | 0.415433 |
2020-12-29 | -16.25 | 28.25 | 0.182338 | 0.320463 |
2020-12-30 | -16.25 | 28.25 | 0.072843 | 0.095342 |
2020-12-31 | -16.25 | 28.25 | 0.010482 | 0.033297 |
366 rows × 4 columns
Let us safe the pandas dataframe as csv file. This allows us to easily load the time-series again at a later stage. You can use the function to_csv()
to save a pandas.DataFrame as csv.
df_combined.to_csv("../../eodata/case_study/2020_ts_cams_aeronet.csv", index_label='time')
The last step is now to plot the two columns of the pandas.DataFrame df_combined
as two individual line plots.
# Initiate a figure
fig = plt.figure(figsize=(20,8))
ax = plt.subplot()
# Define the plotting function
ax.plot(df_combined.duaod550, '-', color='blue', label='CAMS global reanalysis (EAC4) - 550 nm')
ax.plot(df_combined.AOD_500nm, '-', color='green', label='AERONET observations - 500 nm')
plt.axhline(y=0.6, color='r', linestyle='dotted', label='PM10 daily limit')
# Customize the title and axes lables
ax.set_title('\nAerosol Optical Depth at 500 / 550 nm - Santa Cruz Tenerife\n', fontsize=20)
ax.set_ylabel(cams_ts.units, fontsize=14)
ax.set_xlabel('\nDay', fontsize=14)
# Customize the fontsize of the axes tickes
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
# Add a gridline to the plot
ax.grid(linestyle='--')
plt.legend(fontsize=14, loc=2)
<matplotlib.legend.Legend at 0x7f13c625a070>
You see in the plot above that the model and the AERONET observations follow a similar annual cycle of AOD in 2020 for the Santa Cruz station in Tenerife. You also see that for higher AOD values measured by AERONET, the CAMS model mostly underpredicts the AOD intensity.