Mann-Kendall Test (mkt)¶
Overview¶
Python module to compute the Mann-Kendall test for trend in time series data.
This module contains a single function ‘test’ which implements the Mann-Kendall test for a linear trend in a given time series.
Introduction to the Mann-Kendall test¶
The Mann-Kendall test is used to determine whether or not there is a linear monotonic trend in a given time series data. It is a non-parametric trend closely related to the concept of Kendall’s correlation coefficient [1]. The null hypothesis, \(H_0\), states that there is no monotonic trend, and this is tested against one of three possible alternative hypotheses, \(H_a\): (i) there is an upward monotonic trend, (ii) there is a downward monotonic trend, or (iii) there is either an upward monotonic trend or a downward monotonic trend. It is a robust test for trend detection used widely in financial, climatological, hydrological, and environmental time series analysis.
Assumptions underlying the Mann-Kendall test¶
The Mann-Kendall test involves the following assumptions [2] regarding the given time series data:
1. In the absence of a trend, the data are independently and identically distributed (iid).
2. The measurements represent the true states of the observables at the times of measurements.
3. The methods used for sample collection, instrumental measurements and data handling are unbiased.
Advantages of the Mann-Kendall test¶
The Mann-Kendall test provides the following advantages:
1. It does not assume the data to be distributed according to any particular rule, i.e., e.g., it does not require that the data be normally distributed.
2. It is not effected by missing data other than the fact the number of sample points are reduced and hence might effect the statistical significance adversely.
3. It is not effected by irregular spacing of the time points of measurement.
- It is not effected by the length of the time series.
Limitations of the Mann-Kendall test¶
- The following limitations have to be kept in mind:
1. The Mann-Kendall test is not suited for data with periodicities (i.e., seasonal effects). In order for the test to be effective, it is recommended that all known periodic effects be removed from the data in a preprocessing step before computing the Mann-Kendall test.
2. The Mann-Kendall test tends to give more negative results for shorter datasets, i.e., the longer the time series the more effective is the trend detection computation.
Formulae¶
The first step in the Mann-Kendall test for a time series \(x_1, x_2, \dots, x_n\) of length \(n\) is to compute the indicator function \(sgn(x_i - x_j)\) such that:
\[\begin{split}sgn(x_i - x_j) &= \begin{cases} 1, & x_i - x_j > 0\\ 0, & x_i - x_j = 0\\ -1, & x_i - x_j < 0 \end{cases},\end{split}\]
which tells us whether the difference between the measurements at time \(i\) and \(j\) are positive, negative or zero.
Next, we compute the mean and variance of the above quantity. The mean \(E[S]\) is given by:
\[E[S] = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} sgn(x_i - x_j),\]
and the variance \(VAR(S)\) is given by:
\[VAR(S) = \frac{1}{18} \Big( n(n-1)(2n+5) - \sum_{k=1}^p q_k(q_k-1)(2q_k+5) \Big),\]
where \(p\) is the total number of tie groups in the data, and \(q_k\) is the number of data points contained in the \(k\)-th tie group. For example, if the time series measurements were {12, 56, 23, 12, 67, 45, 56, 56, 10}, we would have two tie groups for the measurements 12 and 56, i.e. \(p=2\), and the number of data points in these tie groups would \(q_1=2\) for the tie group with {12}, and \(q_2=3\) for the tie group with {56}.
Using the mean \(E[S]\) and the variance \(VAR(S)\) we compute the Mann-Kendall test statistic, using the following transformation, which ensures that for large sample sizes, the test statistic \(Z_{MK}\) is distributed approximately normally:
\[\begin{split}Z_{MK} &= \begin{cases} \frac{E[S] - 1} {\sqrt{VAR(S)}}, & E[S] > 0\\ 0, & E[S] = 0\\ \frac{E[S] + 1} {\sqrt{VAR(S)}}, & E[S] < 0\\ \end{cases}.\end{split}\]
Hypothesis testing¶
At a significance level \(\alpha\) of the test, which is also the Type I error rate, we compute whether or not to accept the alternative hypothesis \(H_a\) for each variant of \(H_a\) separately:
- \(H_a\): There exists an upward monotonic trend
- If \(Z_{MK} \geq Z_{1 - \alpha}\) then accept \(H_a\), where the notation \(Z_{1 - \alpha}\) denotes the \(100(1-\alpha)\)-th percentile of the standard normal distribution.
- \(H_a\): There exists a downward monotonic trend
- If \(Z_{MK} \leq -Z_{1 - \alpha}\) then accept \(H_a\).
- \(H_a\): There exists either an upward or a downward monotonic trend
- If \(|Z_{MK}| \geq Z_{1 - \alpha/2}\) then accept \(H_a\), where the notation \(|\cdot|\) is used to denote the absolute value function.
Updated formulae for implementation¶
One crucial notion involved in the Mann-Kendall test statistic is that of whether the difference between two measurements is greater than, equal to, or less than zero. This idea is in turn critically linked to the least count (i.e., the minimum possible measurement value) of the time series measurements \(x_i\). For example, let us consider the case when we measure \(x_i\) with a precision \(\varepsilon = 0.01\). In such a case, let us say for some reason, floating point errors in the entries of \(x_i\) in the memory, lead to a \(x_{11} - x_{27} = 0.000251 > 0\). However, to say that this difference is actually greater than zero is meaningless! This is because on the basis of the same measurement process we used on \(x\), we could never ascertain such a small difference. This is why, in this implementation of the Mann-Kendall test, we have included the least count error \(\varepsilon\) as a compulsory requirement for the test statistic estimation.
This allows us to revise the above formulae fo rthe Mann-Kendall test as:
\[\begin{split}sgn(x_i - x_j) &= \begin{cases} 1, & x_i - x_j > \varepsilon\\ 0, & |x_i - x_j| \leq \varepsilon\\ -1, & x_i - x_j < -\varepsilon \end{cases},\end{split}\]
and:
\[\begin{split}Z_{MK} &= \begin{cases} \frac{E[S] - 1} {\sqrt{VAR(S)}}, & E[S] > \varepsilon\\ 0, & |E[S]| \leq \varepsilon\\ \frac{E[S] + 1} {\sqrt{VAR(S)}}, & E[S] < -\varepsilon\\ \end{cases}.\end{split}\]
These revised formulae are the ones that are implemented in the test()
of this module.
Additional estimates¶
In addition to the result of the Mann-Kendall test, which is in the form of a
string indicating whether or not to accept the alternative hypothesis, the
test()
function also return a few additional estimates related to the
estimation of a monotonic trend in the time series.
Estimation of the simple linear regression parameters¶
The slope \(m\) and intercept \(c\) of a straight line fitted through the time series data are estimated as follows:
\[m = r_{x,t} \frac{\sigma_x}{\sigma_t},\]
where r_{x,t} is the Pearson’s cross-correlation coefficient between \(x\) and \(t\).
\[c = \mu_x - m \mu_t\]
where \(\mu\) denotes the mean of the both variables respectively.
Estimation of \(p\)-values¶
The test()
function also returns the \(p\)-values for the given
dataset under the various alternative hypotheses. Note that the estimation of
the \(p\)-value is not essential to the computation of the test results as
formulated above. The \(p\)-values need to estimated separately depending
on the type of alternative hypothesis used and the sign of \(E[S]\).
Denoting \(f(u)\) as the probability density function of the standard
normal distribution, we can write down the \(p\)-values as:
\(H_a\): There exists an upward monotonic trend
\[\begin{split}p_{Z_{MK}} &= \begin{cases} \int_{Z_{MK}}^{\infty} f(u) \mathrm{d}u,& |E[S]|>\varepsilon\\ 0.5, & |E[S]| \leq \varepsilon\\ \end{cases}.\end{split}\]\(H_a\): There exists a downward monotonic trend
\[\begin{split}p_{Z_{MK}} &= \begin{cases} \int^{Z_{MK}}_{-\infty} f(u) \mathrm{d}u,& |E[S]|>\varepsilon\\ 0.5, & |E[S]| \leq \varepsilon\\ \end{cases}.\end{split}\]\(H_a\): There exists either an upward or a downward monotonic trend
\[\begin{split}p_{Z_{MK}} &= 0.5 \begin{cases} \int_{Z_{MK}}^{\infty} f(u) \mathrm{d}u,& E[S]>\varepsilon\\ 1, & |E[S]| \leq \varepsilon\\ \int^{Z_{MK}}_{-\infty} f(u) \mathrm{d}u,& E[S]<-\varepsilon\\ \end{cases}.\end{split}\]
References
[1] | Pohlert, T.
“Non-Parametric Trend Tests and Change-Point Detection”.
R-package trend. Accessed on: 17 April, 2017.
|
[2] | “Mann-Kendall Test For Monotonic Trend”.
Visual Simple Plan. Accessed on: 17 April, 2017.
|
Functions¶
-
mkt.
test
(t, x, eps=None, alpha=None, Ha=None)[source]¶ Runs the Mann-Kendall test for trend in time series data.
Parameters: - t (1D numpy.ndarray) – array of the time points of measurements
- x (1D numpy.ndarray) – array containing the measurements corresponding to entries of ‘t’
- eps (scalar, float, greater than zero) – least count error of measurements which help determine ties in the data
- alpha (scalar, float, greater than zero) – significance level of the statistical test (Type I error)
- Ha (string, options include 'up', 'down', 'upordown') – type of test: one-sided (‘up’ or ‘down’) or two-sided (‘updown’)
Returns: - MK (string) – result of the statistical test indicating whether or not to accept hte alternative hypothesis ‘Ha’
- m (scalar, float) – slope of the linear fit to the data
- c (scalar, float) – intercept of the linear fit to the data
- p (scalar, float, greater than zero) – p-value of the obtained Z-score statistic for the Mann-Kendall test
Raises: - AssertionError : error – least count error of measurements ‘eps’ is not given
- AssertionError : error – significance level of test ‘alpha’ is not given
- AssertionError : error – alternative hypothesis ‘Ha’ is not given