Non-stationary extreme value distributions

[This lecture was partially created with the support of Artificial Intelligence tools (ChatGPT)]

1. Introduction

Non-stationary extreme value distributions extend classical extreme value theory to situations where the statistical properties of extremes change over time or depend on external factors. In many real-world systems, the assumption of stationarity—constant distribution parameters over time—is unrealistic. Climate change, urbanization, economic development, and technological shifts can all introduce trends or variability in the behavior of extreme events. For example, the intensity of rainfall extremes may increase with rising temperatures, or financial market crashes may become more frequent during periods of economic instability.

To account for such changes, non-stationary models allow the parameters of extreme value distributions to vary as functions of time or covariates, like for instance temperature or CO₂ concentration. Commonly used extreme value models include the Generalized Extreme Value (GEV) distribution for block maxima and the Generalized Pareto Distribution (GPD) for threshold exceedances. In a non-stationary framework, parameters such as location, scale, or shape can be expressed as functions of explanatory variables, such as time, climate indices, or other environmental drivers.

This approach enables researchers to capture evolving risk patterns and better understand how extremes respond to underlying processes. Non-stationary extreme value models are widely used in fields such as hydrology, climate science, engineering, finance, and environmental risk assessment. They are particularly important for estimating return levels—values expected to be exceeded on average once every specified number of years—under changing conditions.

However, modeling non-stationarity introduces additional challenges. Researchers must select appropriate covariates, ensure sufficient data to estimate time-varying parameters, and avoid overfitting complex models. Careful statistical diagnostics and model validation are therefore essential.

Overall, non-stationary extreme value analysis provides a powerful framework for studying extremes in dynamic systems. By relaxing the assumption of constant statistical behavior, it allows scientists and practitioners to produce more realistic assessments of rare but impactful events in a changing world.

2. An interesting experiment: a "homemade" non-stationary Gumbel distribution

Let's now try an interesting experiment. With the series of the annual maxima, therefore in the domain of the block maxima method, let's try to estimate the parameters of non-stationary Gumbel distribution by:

Computing the frequency of not exceedance of each value and assuming it is a function of time (besides being a function of the value of random variable);
Picking up randomly parameter values for the Gumbel distribution, by assuming that they are also a function of time, and compute the probability of each value accordingly. Note: assuming that scale and location depend on time means introducing additional parameters in the body of the distribution;
Finding the parameter values that minimise the absolute differences between frequency of not exceedance and probability (both depending on time).

In doing so, we are again approximating a maximum likelihood estimation as we did here.

Let's try to make this experiment by working with simulated data, with the following R code:

#Define the maxima vector
maxg=c()
#Generation of random samples and extraction of the maxima
for (i in 1:1000)
maxg[i]=max(rnorm(100,0+0.0004*i,1+0.0004*i))
#Compute frequency not exceedance of the maxima
#Note: maxima are kept in the original order, as reordering would destroy the dependence on time
freqne=ppoints(maxg)
freqne[order(maxg)]=freqne
plot(freqne)

Let's estimate on the obtained a Gumbel distribution with $\mu$ and $\sigma$ linearly increasing with time, therefore given by:

$\mu=\mu_0+\mu_1*t$

and

$\sigma=\sigma_0+\sigma_1*t$

therefore obtaining a distribution with four parameters, $\mu_0$, $\mu_1$, $\sigma_0$ and $\sigma_1$ to which we assign trial values and estimate their optimal value by maximising our "homemade" likelihood. <\p>

#Define vector of probability of pmax and parameters
pmax=c()
#Set trial values for the parameters describing the variability of mu and sigma on time
trial=c(mean(maxg),0.1,sd(maxg),0.1)

#Define the function to compute the probabilities with the Gumbel distribution and compute the objective function to be optimised
stgumbel=function(trial,maxg)
{
#Compute probability of maxima according to the Gumbel
for(i in 1:1000)
{
#Set parameters varying in time
mu=trial[1]+trial[2]*i
sigma=trial[3]+trial[4]*i
pmax[i]=exp(-exp(-(maxg[i]-mu)/sigma))
}
diffp=sum((freqne-pmax)^2)
return(diffp)
}
#Find the vector trial that minimises diffp
paramgumbel=optim(trial,fn=stgumbel,maxg=maxg)

Let's check the fit graphically and compute KS statistic.

#Compute probability of maxima according to the estimated Gumbel
for(i in 1:1000)
{
#Set parameters varying in time
mu=paramgumbel$par[1]+paramgumbel$par[2]*i
sigma=paramgumbel$par[3]+paramgumbel$par[4]*i
pmax[i]=exp(-exp(-(maxg[i]-mu)/sigma))
}
#Compare sample probability with theoretical probability graphically
plot(freqne,pmax,xlab="Sample frequency",ylab="Theoretical probability")
grid()
abline(c(0,0),c(1,1))
diffp=freqne-pmax
#Compute KS statistic
KS=max(abs(diffp))

For the sake of comparison let's now estimate a stationary Gumbel.

#Compute mean and standard deviation
#Open a new plot window
dev.new()
#Define vector of probability of pmax and parameters
pmax=c()
#Set trial values for the parameters describing the variability of mu and sigma on time
trial=c(mean(maxg),sd(maxg))

#Define the function to compute the probabilities with the Gumbel distribution and compute the objective function to be optimised
stgumbel=function(trial,maxg)
{
#Compute probability of maxima according to the Gumbel
for(i in 1:1000)
{
#Set parameters varying in time
mu=trial[1]
sigma=trial[2]
pmax[i]=exp(-exp(-(maxg[i]-mu)/sigma))
}
diffp=sum((freqne-pmax)^2)
return(diffp)
}
#Find the vector trial that minimises diffp
paramgumbel1=optim(trial,fn=stgumbel,maxg=maxg)

#Let's check the fit graphically and compute KS statistic
#Compute probability of maxima according to the estimated Gumbel
for(i in 1:1000)
{
#Set parameters varying in time
mu=paramgumbel1$par[1]
sigma=paramgumbel1$par[2]
pmax[i]=exp(-exp(-(maxg[i]-mu)/sigma))
}
#Compare sample probability with theoretical probability graphically
plot(freqne,pmax,xlab="Sample frequency",ylab="Theoretical probability")
grid()
abline(c(0,0),c(1,1))
diffp1=freqne-pmax
#Compute KS statistic
KS1=max(abs(diffp1))

#And compare KS and KS1 it with the critical value for p=0.05
crit=1.36/sqrt(1000)

After gaining experience with this first application of a non-stationary Gumbel, let's now explore the literature for the general case of the non-stationary GEV.

3. GEV distribution with time-varying parameters

As we know, the standard GEV distribution is characterized by three parameters: location, scale, and shape. The location parameter determines the central tendency of the extreme values. The scale parameter controls the spread or variability of extremes. The shape parameter governs the behavior of the tail and determines whether extremes are bounded or heavy-tailed.

In a time-varying (non-stationary) GEV model, one or more parameters change as a function of time. Typically, the location parameter is modeled as a linear or nonlinear function of time or external covariates. For example, increasing temperatures associated with Climate Change may lead to a gradual shift in the location parameter of extreme temperature distributions. Similarly, the scale parameter may vary to reflect increasing variability in extremes. Allowing parameters to evolve over time enables the model to capture trends and structural changes in extreme events.

Allowing such variability in time means including additional parameters in the structure of the GEV distribution, namely, we need to add the parameters controlling the change along time of the classical GEV parameters. For instance,

$\mu(t)=\beta_0+\beta_1t$

$\sigma(t)=exp⁡(\gamma_0+\gamma_1t)$

$\xi(t)=\xi_0$

Here, the location parameter evolves linearly with time, allowing the center of the distribution of extremes to shift. The exponential form for the scale parameter ensures that $σ(t)>0$ while allowing the variability of extremes to change gradually. In many applications, the shape parameter is kept constant because it is difficult to estimate reliably, although it may also be modeled as time-dependent.

More generally, parameters can depend on external covariates $z(t)$, such as temperature or climate indices:

$μ(t)=\beta_0+\beta_1z(t)$

$σ(t)=exp⁡(\gamma_0+\gamma_1z(t))$

In the above equations, $\beta_0$, $\beta_1$, $\gamma_0$ and $\gamma_1$ are additional parameters to be estimated. Increasing the number of parameters increase the variance of the estimation and, therefore, uncertainty. Therefore the extension to the non-stationary case is to be carefully evaluated depending on data availability, the characteristics of the design project and the underlying knowledge about climate change.

Operationally the estimation of a non.stationary GEV can be easily carried out with the help of software. In R we can use the code that follows.

# Install if necessary
install.packages("ismev")
library(ismev)
# Data. We make reference to the Po river case
poflow=read.table("Po-pontelagoscuro1920-2026.txt", header = FALSE, sep = " ", col.names = c("Date", "Flow"), stringsAsFactors = FALSE, skip=1)
# Convert Date column to Date format
poflow$Date=as.Date(poflow$Date, format = "%d/%m/%Y")
#Extract annual maxima
poflow$Year=format(poflow$Date, "%Y")
annual_max=aggregate(Flow ~ Year, data = poflow, max)
annual_max$Year=as.numeric(annual_max$Year)
annual_max$Year=array(annual_max$Year,dim=c(length(annual_max$Year),1)) # Fit a non-stationary GEV with time-dependent location parameter. Scale and shape remain constant.
fit=gev.fit(annual_max$Flow, ydat = annual_max$Year, mul = 1) # mul=1 allows the location parameter to change in time
# View results
fit$mle
fit$se
# Allow both location and scale to vary with time
fit1=gev.fit(annual_max$Flow, ydat = annual_max$Year, mul = 1,sigl = 1)
# sigl=1 allows scale depending on year
# Diagnostic plots
gev.diag(fit)

This produces:

QQ plot;
return level plot;
residual diagnostics.

Now let's assess the variability of the location parameter in time:

# Plot the time-varying location parameter
beta0=fit$mle[1]
beta1=fit$mle[2]
mu_t=beta0 + beta1 * annual_max$Year[,1]
plot(annual_max$Year[,1], mu_t, type="l",xlab="Year",ylab="Location parameter",main="Time-varying GEV location parameter")

An analogous assessment can be made to check the variability in time of the shape parameter.

In summary, A possible modelling strategy may be:

Fit stationary GEV;
Fit non-stationary GEV;
Compare models using statistical tests like the Akaike Information Criterion (AIC) introduced in the box below.

Box: the Akaike Information Criterion

The Akaike Information Criterion (AIC) is a statistical measure used for model selection. It helps choose the model that best balances good fit and parsimonious structure, in terms of number of parameters. Parsimony means reduced uncertainty and therefore it is a desirable feature in technical applications.

Suppose we have a statistical model fitted by maximum likelihood. Let $L$ be the maximum value of the likelihood function for the model and let $k$ be the number of estimated parameters in the model itself. The AIC is defined as:

$AIC = 2k - 2\ln(L)$.

The first term $2k$ penalizes model complexity, while the second term $-2 \ln(L)$ rewards goodness of fit. Therefore, a model with higher likelihood has a smaller $-2\ln(L)$, but adding many parameters increases $k$, and therefore the penalty, thus avoid overfitting.

When comparing models, the solution with the lowest AIC may be preferable for seeking a compromise between goodness-of-the-fit and robustness. AIC does not measure absolute model quality, as it rather measures relative quality.

For small samples, a corrected version is often used, given by:

$AIC_c = AIC + \frac{2k(k+1)}{n-k-1}$

A variant of the AIC is the Bayesian information criterion (BIC) given by:

$\mathrm {BIC} =k\ln(n)-2\ln({L}).$

where $n$ is the sample size. Therefore, BIC penalises the model with more parameters with a higher penalty.

Note that the likelihood in principle should assume values in the range [0,1] and therefore the logarithm of the likelihood is a negative value. Therefore, the AIC and BIC are expected to be positive. But actually some software give back likelihood values that are greater than 1 and therefore the resulting AIC and BIC may be negative. It follows that the absolute value of AIC and BIC is meaningless, only differences between models matter, with the model with the lowest AIC/BIC being preferred.

An example of AIC and BIC computation for the above estimation of GEV for the Po River is given by the code below.

#Note that the routine AIC and BIC compute the criteria for an object of class "logLik".
#We then need to add attributes to the results of gev.fit as follows:
class(fit$nllh) = "logLik"
attr(fit$nllh, "df")=4 #Number of parameter of the fitted GEV
attr(fit$nllh, "nobs")=107 #Sample size of the Po River annual maxima series
class(fit1$nllh)="logLik"
attr(fit1$nllh, "df")=5
attr(fit1$nllh, "nobs")=107
#We can then apply AIC and BIC
AIC(fit$nllh)
AIC(fit1$nllh)
BIC(fit$nllh)
BIC(fit1$nllh)
#Note that the gev.fit routine gives back a likelihood value that is
#much greater than 1 and therefore the resulting BIC and AIC are negative

4. Non-linear variation of GEV parameters

An alternative approach to fitting a non-stationary GEV distribution is given by the application of Generalized Additive Models for Location, Scale and Shape (GAMLSS), which allow nonlinear trends in time instead of just linear ones. In R this is implemented in the package "gamlss". This approach lets you model each parameter of the GEV (location $\mu$, scale $\sigma$, shape $\xi$) as smooth functions of time. The code is:

#Install and load packages
install.packages(c("gamlss","gamlss.dist","gamlssx"))
library(gamlss)
library(gamlss.dist)
library(gamlssx)
# Prepare the data
datapo=data.frame(annual_max$Flow, annual_max$Year)
#Fit a nonlinear time-dependent GEV. Here the location parameter varies smoothly with time
#Note: annual_max$Year must be a vector with the routine below and therefore:
annual_max$YearVect=annual_max$Year[,1]
fit_ns=gamlss(annual_max$Flow ~ pb(annual_max$YearVect), sigma.formula = ~ 1, nu.formula = ~ 1, family = GEVfisher, data = datapo)
#This corresponds to location being a smooth function of year, scale and shape constant
#If we want to allow scale to vary with time as well then:
fit1_ns=gamlss(annual_max$Flow ~ pb(annual_max$YearVect), sigma.formula =~pb(annual_max$YearVect), nu.formula = ~ 1, family = GEVfisher, data = datapo)

Now, let's move forward with model diagnostic. Note that the gamlss routing gives back a likelihood value included in the range [0,1] and therefore the resulting AIC/BIC are expected to be positive.

#Model diagnostics
plot(fit_ns)
wp(fit_ns) # worm plot diagnostics
#Plot the estimated time-varying location parameter
mu_hat = fitted(fit_ns, "mu")
plot(annual_max$YearVect, mu_hat, type="l", xlab="Year", ylab="Estimated location parameter", main="Non-stationary GEV location parameter")
# Compare models
AIC(fit_ns)
AIC(fit1_ns)
BIC(fit_ns)
BIC(fit1_ns)
#Lower AIC and BIC indicates the better model.

Box: worm plot
A worm plot is a diagnostic tool to assess how well a model’s assumed distribution fits the data, especially across different ranges of a covariate. It is essentially a detrended Q–Q plot of normalized errors in quantile estimation, designed to make deviations from the assumed distribution easier to interpret.

While a standard Q–Q plot compares empirical quantiles to theoretical quantiles, a worm plot plots the difference observed quantile − expected quantile against the expected quantile. So, in the x-axis we display the theoretical quantiles against the difference between observed and theoretical quantiles in the y-axis. A perfect model should then results in the plot as an horizontal line ad ordinate zero, so departures from the assumed distribution appear as visible "worms".

When the model does not fit well, the detrended Q–Q curve often looks like a wiggling worm, hence the name. The shape of the worm reports what aspect of the distribution is wrong.

In particular:

Upward/downward slope: mean (location) misspecified;
U-shape: variance (scale) incorrect;
S-shape: skewness not captured;
Inverted S: kurtosis not captured;
Systematic deviations in covariate groups: overall model structure misspecified.

A key feature in the R routine GAMLSS is that worm plots can be made within bins of the variable. For example: we can split residuals by age groups and draw worm plot for each group. This helps detect where in the variable space the model fails.

5. Time varying return level

In the non-stationary GEV model, return period id varying with time (e.g., 50-year flood changing with time). This is often the key result in extreme-value studies. The code is as follows.

#Extract time-dependent parameters
mu_t=fitted(fit_ns, "mu")
sigma_t=fitted(fit_ns, "sigma")
xi_t=fitted(fit_ns, "nu")
year=annual_max$YearVect
#Compute return levels
#Example for a 50-year return level
T=50
p=1 - 1/T
RL50=qGEV(p, mu = mu_t, sigma = sigma_t, nu = xi_t)
#Here `qGEV()` comes from gamlss.dist
#Plot the time-varying return level
plot(year, RL50, type = "l", lwd = 2, xlab = "Year", ylab = "50-year return level", main = "Time-varying 50-year flood estimate")

This plot shows how the estimated 50-year extreme flow changes through time. Now, let's compare different return periods.

RL20 = qGEV(1 - 1/20, mu_t, sigma_t, xi_t)
RL100 = qGEV(1 - 1/100, mu_t, sigma_t, xi_t)
plot(year, RL50, type="l", lwd=2, ylim=range(c(RL20,RL100)), xlab="Year", ylab="Return level", main="Time-varying return levels") lines(year, RL20, lty=2)
lines(year, RL100, lty=3)
legend("topleft", legend=c("20-year","50-year","100-year"), lty=c(2,1,3), bty="n")

If the curve increases with time, extreme floods are becoming larger. If flat, extremes look stationary. If decreasing, extremes are weakening. This approach is widely used in non-stationary flood frequency analysis.

Last modified on April 14, 2026

236 viste