Bounded Power Law (BPL)

Overview

Python module to work with bounded power-law (BPL) distributed random variables.

This module contains a few basic functions that help analyse power-law distributed random variables that can be either bounded only from below or from both above and below. It provides the following functions:

  1. Generate random samples from power-law distributed random variables.
  2. Estimate a logarithimically binned histogram for a power-law distributed random variable.
  3. Estimate the probability density function and the cumulative distribution function for specified power-law exponent.

Introduction to BPL

The probability density function (PDF) of a power-law variable is at best bounded from below to avoid divergence of the density near zero. In this case, the power-law distribution is defined on the support \((x_{min}, {\infty})\).Moreover, some real-world systems may have a power-law regime in which case the distribution is bounded from both above and below, and will have the support \((x_{min}, x_{max})\).

Assuming a continuous random variable distributed according to a power law with exponent \(\alpha > 1\), the PDFs for the two cases above are estimated by following the steps in [1], using which the cumulative distribution function (CDF) for the two cases are also easily estimated. The functional forms of the PDFs and the CDFs are given below, and are also implemented in this module as pdf() and cdf().

In order to generate random samples for a power-law distributed random variable, this module uses the inverse transform method [2] which relies on the inverse function of the CDF of the random variable under consideration. The inverse funcational form of the CDfs of the two power-law distribution types considered here are also listed below. The random sampling routine is implemented in sample().

Formulae

1. Power-law distributed on \((x_{min}, \infty)\):

PDF:

\[p(x) = - \frac{\beta}{x_{min}^{\beta}} x^{-\alpha}\]

CDF:

\[P(x) = - \frac{x_{min}^{\beta} - x^{\beta}}{x_{min}^{\beta}}\]

Inverse of CDF:

\[P^{-1}(x) = - x_{min} (1 - x)^{1 / \beta}\]

2. Power-law distributed on \((x_{min}, x_{max})\)

PDF:

\[p(x) = \frac{\beta}{x_{max}^{\beta} - x_{min}^{\beta}}x^{-\alpha}\]

CDF:

\[P(x) = \frac{x^{\beta} - x_{min}^{\beta}} {x_{max}^{\beta} - x_{min}^{\beta}}.\]

Inverse of CDF:

\[ \begin{align}\begin{aligned}P^{-1}(x) &= (a + bx) ^ {1 / \beta}\\a &= x_{min}^{\beta}\\b &= x_{max}^{\beta} - a\end{aligned}\end{align} \]

where we have defined \(\beta = 1 - \alpha\) for a more concise notation. Note that the inverse transform method used here recommended by Clauset et al. in Sec. II B of [1] in order to generate random samples according the given power-law exponent. Also, note that they provide a separate set of formulae for discrete power-law variables which are not considered here. Thus, care should be taken before extending the results of code implemented in this module to discrete power-law distributed random variables.

References

[1](1, 2) Clauset, A., Shalizi, C. R., Newman, M. E. J. “Power-law Distributions in Empirical Data”. SFI Working Paper: 2007-12-049 (2007). http://www.santafe.edu/media/workingpapers/07-12-049.pdf
[2]“Inverse transform sampling”. Wikipedia. Accessed on: 23 May, 2016. https://en.wikipedia.org/wiki/Inverse_transform_sampling

Functions

bpl.pdf(x, alpha, xmin, xmax=None)[source]

Returns the PDF of a (bounded) power-law variable.

Parameters:
  • x (numpy.ndarray of 1-dimension) – array of values on which the power-law PDF is to be evaluated
  • alpha (scalar, float, greater than 1) – power-law exponent
  • xmin (scalar, float) – lower bound of the power-law random variable (greater than zero)
  • xmax (scalar, float, optional) – upper bound of the power-law random variable
Returns:

pdf – array of the probability densities evaluated for each point in x.

Return type:

numpy.ndarray with shape = (len(x), )

See also

cdf(), sample()

bpl.cdf(x, alpha, xmin, xmax=None)[source]

Returns the CDF of a (bounded) power-law variable.

Parameters:
  • x (numpy.ndarray of 1-dimension) – array of values on which the power-law CDF is to be evaluated
  • alpha (scalar, float, greater than 1) – power-law exponent
  • xmin (scalar, float) – lower bound of the power-law random variable (greater than zero)
  • xmax (scalar, float, optional) – upper bound of the power-law random variable
Returns:

cdf – array of the cumulative probabilities evaluated for each point in x.

Return type:

numpy.ndarray with shape = (len(x), )

See also

pdf(), sample()

bpl.sample(alpha=2.5, size=1000, xmin=1, xmax=None)[source]

Generates a random sample from a (bounded) power-law distribution.

Parameters:
  • alpha (scalar, float, greater than 1) – power-law exponent
  • size (scalar, integer) – size (i.e. length) of the random sample to be generated
  • xmin (scalar, float) – lower bound of the power-law random variable (greater than zero)
  • xmax (scalar, float, optional) – upper bound of the power-law random variable
Returns:

s – random deviates from a power-law distribution with exponent alpha and with upper and lower bounds as specified

Return type:

numpy.ndarray with shape = (size, )

See also

histogram(), pdf()

bpl.histogram(arr, bins=None, plot=False, ax=None, **kwargs)[source]

Plots a power law histogram using logarithmic binning.

Parameters:
  • arr (numpy.ndarray of 1-dimension) – sample of a power-law distributed random deviates
  • bins (numpy.ndarray of 1-dimension, optional) – bins for which the histogram counts are obtained. This should be of shape = (len(arr) + 1, ). If not provided, an array of logarithmically spaced bins is estimated using Sturges’ formula [3].
  • plot (boolean) – If True, the histogram results are plotted on given axes or on a newly created matplotlib.axes object. Default = False.
  • ax (matplotlib.axes, optional) – axes on which the estimated histogram is to be plotted. If not provided, a standard matplotlib.axes object is created for plotting.
Returns:

  • hist (numpy.ndarray of 1 dimension) – array containing the histogram counts/fractions for the given random sample. This is of length n, where n is the number of bins given by Sturges formula.
  • bins (numpy.ndarray of 1 dimension) – array containing the bin edges for the given random sample obtained by constructing n number of logarithmically spaced bins.
  • ax (matplotlib.axes) – axes on which the histogram is plotted

Notes

The kwargs arguments are partly passed on to numpy.histogram (such as the density keyword argument), and to matplotlib.pyplot.plot (such as the mec, mfc, and ms keywords), and finally to also specify axes labels and their fontsizes using xlab and fs keyword arguments. These could also, in principle, be done separately outside of this function by using the necessary methods of the ax object.

See also

pdf(), sample()

References

[3]Sturges, H. A. (1926). “The choice of a class interval”. Journal of the American Statistical Association, 21(153), 65-66. http://www.tandfonline.com/doi/abs/10.1080/01621459.1926.10502161