Source code for bpl

"""
Python module to work with bounded power-law (BPL) distributed random variables.

This module contains a few basic functions that help analyse power-law
distributed random variables that can be either bounded only from below or
from both above and below. It provides the following functions:

1.  Generate random samples from power-law distributed random variables.
2.  Estimate a logarithimically binned histogram for a power-law
    distributed random variable.
3.  Estimate the probability density function and the cumulative
    distribution function for specified power-law exponent.

Introduction to BPL
-------------------

The probability density function (PDF) of a power-law variable is at best
bounded from below to avoid divergence of the density near zero. In this case,
the power-law distribution is defined on the support
:math:`(x_{min}, {\\infty})`.Moreover, some real-world systems may have a
power-law regime in which case the distribution is bounded from both above and
below, and will have the support :math:`(x_{min}, x_{max})`.

Assuming a continuous random variable distributed according to a power law
with exponent :math:`\\alpha > 1`, the PDFs for the two cases above are
estimated by following the steps in [1]_, using which the cumulative
distribution function (CDF) for the two cases are also easily estimated. The
functional forms of the PDFs and the CDFs are given below, and are also
implemented in this module as :py:func:`pdf` and :py:func:`cdf`.

In order to generate random samples for a power-law distributed random
variable, this module uses the inverse transform method [2]_ which relies on the
inverse function of the CDF of the random variable under consideration. The
inverse funcational form of the CDfs of the two power-law distribution types
considered here are also listed below. The random sampling routine is
implemented in :py:func:`sample`.

Formulae
--------
1. Power-law distributed on :math:`(x_{min}, \\infty)`:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++

    PDF:

    .. math::
        p(x) = - \\frac{\\beta}{x_{min}^{\\beta}} x^{-\\alpha}

    CDF:

    .. math::
        P(x) = - \\frac{x_{min}^{\\beta} - x^{\\beta}}{x_{min}^{\\beta}}

    Inverse of CDF:

    .. math::
        P^{-1}(x) = - x_{min} (1 - x)^{1 / \\beta}

2. Power-law distributed on :math:`(x_{min}, x_{max})`
++++++++++++++++++++++++++++++++++++++++++++++++++++++

    PDF:

    .. math::
        p(x) = \\frac{\\beta}{x_{max}^{\\beta} - x_{min}^{\\beta}}x^{-\\alpha}

    CDF:

    .. math::
        P(x) = \\frac{x^{\\beta} - x_{min}^{\\beta}}
                     {x_{max}^{\\beta} - x_{min}^{\\beta}}.

    Inverse of CDF:

    .. math::
        P^{-1}(x) &= (a + bx) ^ {1 / \\beta}

        a &= x_{min}^{\\beta}

        b &= x_{max}^{\\beta} - a

where we have defined :math:`\\beta = 1 - \\alpha` for a more concise notation.
Note that the inverse transform method used here recommended by Clauset et al.
in Sec. II B of [1]_ in order to generate random samples according the given
power-law exponent. Also, note that they provide a separate set of formulae
for discrete power-law variables which are not considered here. Thus, care
should be taken before extending the results of code implemented in this module
to discrete power-law distributed random variables.


References
----------
.. [1]  Clauset, A., Shalizi, C. R., Newman, M. E. J.
        "Power-law Distributions in Empirical Data".
        SFI Working Paper: 2007-12-049 (2007).
        http://www.santafe.edu/media/workingpapers/07-12-049.pdf

.. [2]  "Inverse transform sampling".
        Wikipedia. Accessed on: 23 May, 2016.
        https://en.wikipedia.org/wiki/Inverse_transform_sampling

"""
# Created: Mon May 23, 2016  02:56PM
# Last modified: Wed May 25, 2016  11:41AM
# Copyright: Bedartha Goswami <goswami@uni-potsdam.de>


import numpy as np
import matplotlib.pyplot as pl

__all__ = ["sample", "pdf", "cdf", "histogram"]


[docs]def sample(alpha=2.5, size=1000, xmin=1, xmax=None): """ Generates a random sample from a (bounded) power-law distribution. Parameters ---------- alpha : scalar, float, greater than 1 power-law exponent size : scalar, integer size (i.e. length) of the random sample to be generated xmin : scalar, float lower bound of the power-law random variable (greater than zero) xmax : scalar, float, optional upper bound of the power-law random variable Returns ------- s : numpy.ndarray with ``shape = (size, )`` random deviates from a power-law distribution with exponent ``alpha`` and with upper and lower bounds as specified See Also -------- histogram pdf """ assert alpha > 1, "Power law exponent should be greater than 1!" beta = 1. - alpha r = np.random.rand(size) if not xmax: s = xmin * ((1. - r) ** (1 / beta)) else: a = xmin ** beta b = xmax ** beta - a s = (a + b * r) ** (1. / beta) return s
[docs]def pdf(x, alpha, xmin, xmax=None): """ Returns the PDF of a (bounded) power-law variable. Parameters ---------- x : numpy.ndarray of 1-dimension array of values on which the power-law PDF is to be evaluated alpha : scalar, float, greater than 1 power-law exponent xmin : scalar, float lower bound of the power-law random variable (greater than zero) xmax : scalar, float, optional upper bound of the power-law random variable Returns ------- pdf : numpy.ndarray with ``shape = (len(x), )`` array of the probability densities evaluated for each point in ``x``. See Also -------- cdf sample """ assert alpha > 1, "Power law exponent should be greater than 1!" beta = 1. - alpha if not xmax: k = - beta / xmin ** beta pdf = k * (x ** -alpha) else: k = xmax ** beta - xmin ** beta pdf = (beta / k) * (x ** -alpha) return pdf
[docs]def cdf(x, alpha, xmin, xmax=None): """ Returns the CDF of a (bounded) power-law variable. Parameters ---------- x : numpy.ndarray of 1-dimension array of values on which the power-law CDF is to be evaluated alpha : scalar, float, greater than 1 power-law exponent xmin : scalar, float lower bound of the power-law random variable (greater than zero) xmax : scalar, float, optional upper bound of the power-law random variable Returns ------- cdf : numpy.ndarray with ``shape = (len(x), )`` array of the cumulative probabilities evaluated for each point in ``x``. See Also -------- pdf sample """ assert alpha > 1, "Power law exponent should be greater than 1!" beta = 1. - alpha if not xmax: k = 1. cdf = k - (x / xmin) ** beta else: k = xmax ** beta - xmin ** beta cdf = (x ** beta - xmin ** beta) / k return cdf
[docs]def histogram(arr, bins=None, plot=False, ax=None, **kwargs): """ Plots a power law histogram using logarithmic binning. Parameters ---------- arr : numpy.ndarray of 1-dimension sample of a power-law distributed random deviates bins : numpy.ndarray of 1-dimension, optional bins for which the histogram counts are obtained. This should be of ``shape = (len(arr) + 1, )``. If not provided, an array of logarithmically spaced bins is estimated using Sturges' formula [3]_. plot : boolean If ``True``, the histogram results are plotted on given axes or on a newly created ``matplotlib.axes`` object. Default ``= False``. ax : matplotlib.axes, optional axes on which the estimated histogram is to be plotted. If not provided, a standard ``matplotlib.axes`` object is created for plotting. Returns ------- hist : numpy.ndarray of 1 dimension array containing the histogram counts/fractions for the given random sample. This is of length ``n``, where ``n`` is the number of bins given by Sturges formula. bins : numpy.ndarray of 1 dimension array containing the bin edges for the given random sample obtained by constructing ``n`` number of logarithmically spaced bins. ax : matplotlib.axes axes on which the histogram is plotted Notes ----- The **kwargs** arguments are partly passed on to ``numpy.histogram`` (such as the ``density`` keyword argument), and to ``matplotlib.pyplot.plot`` (such as the ``mec``, ``mfc``, and ``ms`` keywords), and finally to also specify axes labels and their fontsizes using ``xlab`` and ``fs`` keyword arguments. These could also, in principle, be done separately outside of this function by using the necessary methods of the ``ax`` object. See Also -------- pdf sample References ---------- .. [3] Sturges, H. A. (1926). "The choice of a class interval". Journal of the American Statistical Association, 21(153), 65-66. http://www.tandfonline.com/doi/abs/10.1080/01621459.1926.10502161 """ # argument parsing density = False mec, mfc, ms = "none", "Black", 5 xlab, fs = "Observable (Units)", 12 for key in kwargs: exec("%s = kwargs[key]" % key) if not bins: bins = _logbins(arr) # histogram counts hist = np.histogram(arr, bins=bins, density=density)[0] # histogram plot if plot: # create axes if not given if not ax: fig = pl.figure(figsize=[8.5, 6.5]) ax = fig.add_subplot(111) # plot histogram results mid_pts = 0.5 * (bins[:-1] + bins[1:]) ax.plot(mid_pts, hist, "o", mec=mec, mfc=mfc, ms=ms) ax.set_xscale("log") ax.set_yscale("log") ax.grid(which="both", axis="both") ax.set_xlabel(xlab, fontsize=fs) ax.set_ylabel("Probability", fontsize=fs) # return histogram results and axis object for further plotting if plot: return hist, bins, ax else: return hist, bins
def _logbins(arr): """ Returns an array of logarithmically spaced bins for given array. """ n = len(arr) nbins = np.ceil(np.log2(n)) + 1 # Sturges formula arrmin = arr[arr != 0.].min() arrmax = arr.max() bins = np.logspace(np.log10(arrmin), np.log10(arrmax), num=nbins, base=10) return bins