Extreme Value Analysis using ‘R’

The ‘R’ software is excellent for statistical analysis and many packages provide the same methods for estimating extremes as those outlined here. If you haven’t already, I would certainly recommend getting to grips with ‘R’. It is free and incredibly powerful. It can implement any statistical methodology you are likely to need. If you do give ‘R’ a go, there are numerous packages such as ‘evd’ that implement these extreme value methods. The ‘UKFE’ package has functions to implement extreme value analysis and the UK flood estimation handbook methods (and much more besides).

For example, with the ‘UKFE’ package in ‘R’, to estimate an extreme value such as a 50 year return period (RP) flood using block maximum data (such as annual maximums), the bellow script could be used. If you haven’t already installed the package you first need to enter the following command

install.packages("UKFE")

Then, to load in the package:

library(UKFE)

To bring a small sample of data into the ‘R’ environment, the scan() option can be used:

My.Data <- scan()

After typing the above, press enter, copy and paste your data into R and press enter again. The UKFE package has data and examples that can be used for our purposes. For example, to get the annual maximum sample for river gauge site 69041 (which is on the River Tame at Broomstairs) the ‘GetAM’ function provides a data frame, with dates in the first column and peak flows in the second. The following AMplot function then plots the AM series:

AM69041 <- GetAM(69041)
AMplot(AM69041)

Figure 1: Annual maximum peak flows on the River Tame at Broomstairs

We can then make estimates directly on the data using the ‘GEVAM’ function, or we can first derive the parameters using the GEVPars, which has an option for optimisation with L-moments or maximum likelihood estimation. Estimating the 50-year flow with the GEV:

GEVAM(AM69041$Flow, RP = 50)

The dollar sign is to select the flow column from the data frame. If it was a single sample with no other data (a vector), this would not be needed; for example it could just be GEVAM(My.Data, RP = 50). Where RP is the return period. The usual notation for return period of T is not used because T is a logical statement in ‘R’.  

For estimating the parameters the GEVPars function can be used and the GEVEst function can then be used with the result for the flow estimates;

GEVPars(AM69041$Flow)  

If this is saved as a vector object, we can use it with the GEVEst function:

Pars69041 <- as.numeric(GEVPars(AM69041$Flow))
GEVEst(loc = Pars69041[1], scale = Pars69041[1], shape = Pars69041[1], RP = 50)

The same functions can also be used to estimate return period as a function of flow, using the q argument instead of the RP argument. We could determine, for example, the return period of the maximum flow observed. Firstly, we can see the maximum flow using max(AM69041$Flow) and this script can be used within the GEVAM function:

GEVAM(AM69041$Flow, q = max(AM69041$Flow))

 For quantification of uncertainty using confidence intervals, estimated using the bootstrap method, the ‘UncSS’ function can be used. The default confidence level is 95%.

UncSS(AM69041$Flow, func = GEVAM, RP = 50)

Lastly we can plot a flood frequency curve:

EVPlot(AM69041$Flow, dist = "GEV", scaled = FALSE, Title = "Flood Frequency curve: 69041")

Figure 2: Flood frequency curve for the River Tame at Broomstairs

Similarly for the peaks over threshold (POT) approach the UKFE package has some daily flow data (stored as ThamesPQ) and a function for extracting the peaks over threshold:

ThamesPOT <- POTextract(ThamesPQ[,c(1,3)], thresh = 0.90)

The date and flow data are in columns one and three in the data.frame, hence the “ThamesPQ[,c(1,3)]” in the code. Again, the parameters can be estimated or estimates can be obtained directly from the data. Firstly, to convert from the POT scale to annual scale, we must note the average number of peaks per year in the data (in this case it’s 1.867) and use it in the function.

GenParetoPOT(ThamesPOT$peak, ppy = 1.867, RP = 50)

Where ppy is peaks per year. As with the GEV functions, we can also derive RP as a function of flow and we can also estimate the parameters separately.

Flood Estimation Handbook using UKFE

The UKFE package has many functions of use to the flood hydrologist and the details of them can be viewed within the R environment or via the following pdf:

https://cran.r-project.org/web/packages/UKFE/UKFE.pdf

The below link takes you to the quick guide to implementing the flood estimation handbook (FEH) methods using the UKFE package:

https://www.floodhydrostats.com/ukfe-package-feh-quick-guide