Source code for bpl

"""
Python module to work with bounded power-law (BPL) distributed random variables.

This module contains a few basic functions that help analyse power-law
distributed random variables that can be either bounded only from below or
from both above and below. It provides the following functions:

1.  Generate random samples from power-law distributed random variables.
2.  Estimate a logarithimically binned histogram for a power-law
    distributed random variable.
3.  Estimate the probability density function and the cumulative
    distribution function for specified power-law exponent.

Introduction to BPL
-------------------

The probability density function (PDF) of a power-law variable is at best
bounded from below to avoid divergence of the density near zero. In this case,
the power-law distribution is defined on the support
:math:`(x_{min}, {\\infty})`.Moreover, some real-world systems may have a
power-law regime in which case the distribution is bounded from both above and
below, and will have the support :math:`(x_{min}, x_{max})`.

Assuming a continuous random variable distributed according to a power law
with exponent :math:`\\alpha > 1`, the PDFs for the two cases above are
estimated by following the steps in [1]_, using which the cumulative
distribution function (CDF) for the two cases are also easily estimated. The
functional forms of the PDFs and the CDFs are given below, and are also
implemented in this module as :py:func:`pdf` and :py:func:`cdf`.

In order to generate random samples for a power-law distributed random
variable, this module uses the inverse transform method [2]_ which relies on the
inverse function of the CDF of the random variable under consideration. The
inverse funcational form of the CDfs of the two power-law distribution types
considered here are also listed below. The random sampling routine is
implemented in :py:func:`sample`.

Formulae
--------
1. Power-law distributed on :math:`(x_{min}, \\infty)`:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++

    PDF:

    .. math::
        p(x) = - \\frac{\\beta}{x_{min}^{\\beta}} x^{-\\alpha}

    CDF:

    .. math::
        P(x) = - \\frac{x_{min}^{\\beta} - x^{\\beta}}{x_{min}^{\\beta}}

    Inverse of CDF:

    .. math::
        P^{-1}(x) = - x_{min} (1 - x)^{1 / \\beta}

2. Power-law distributed on :math:`(x_{min}, x_{max})`
++++++++++++++++++++++++++++++++++++++++++++++++++++++

    PDF:

    .. math::
        p(x) = \\frac{\\beta}{x_{max}^{\\beta} - x_{min}^{\\beta}}x^{-\\alpha}

    CDF:

    .. math::
        P(x) = \\frac{x^{\\beta} - x_{min}^{\\beta}}
                     {x_{max}^{\\beta} - x_{min}^{\\beta}}.

    Inverse of CDF:

    .. math::
        P^{-1}(x) &= (a + bx) ^ {1 / \\beta}

        a &= x_{min}^{\\beta}

        b &= x_{max}^{\\beta} - a

where we have defined :math:`\\beta = 1 - \\alpha` for a more concise notation.
Note that the inverse transform method used here recommended by Clauset et al.
in Sec. II B of [1]_ in order to generate random samples according the given
power-law exponent. Also, note that they provide a separate set of formulae
for discrete power-law variables which are not considered here. Thus, care
should be taken before extending the results of code implemented in this module
to discrete power-law distributed random variables.


References
----------
.. [1]  Clauset, A., Shalizi, C. R., Newman, M. E. J.
        "Power-law Distributions in Empirical Data".
        SFI Working Paper: 2007-12-049 (2007).
        http://www.santafe.edu/media/workingpapers/07-12-049.pdf

.. [2]  "Inverse transform sampling".
        Wikipedia. Accessed on: 23 May, 2016.
        https://en.wikipedia.org/wiki/Inverse_transform_sampling

"""
# Created: Mon May 23, 2016  02:56PM
# Last modified: Wed May 25, 2016  11:41AM
# Copyright: Bedartha Goswami <goswami@uni-potsdam.de>


import numpy as np
import matplotlib.pyplot as pl

__all__ = ["sample", "pdf", "cdf", "histogram"]


[docs]def sample(alpha=2.5, size=1000, xmin=1, xmax=None):
    """
    Generates a random sample from a (bounded) power-law distribution.

    Parameters
    ----------
    alpha : scalar, float, greater than 1
        power-law exponent
    size : scalar, integer
        size (i.e. length) of the random sample to be generated
    xmin : scalar, float
        lower bound of the power-law random variable (greater than zero)
    xmax : scalar, float, optional
        upper bound of the power-law random variable

    Returns
    -------
    s : numpy.ndarray with ``shape = (size, )``
        random deviates from a power-law distribution with exponent ``alpha``
        and with upper and lower bounds as specified

    See Also
    --------
    histogram
    pdf

    """
    assert alpha > 1, "Power law exponent should be greater than 1!"
    beta = 1. - alpha
    r = np.random.rand(size)
    if not xmax:
        s = xmin * ((1. - r) ** (1 / beta))
    else:
        a = xmin ** beta
        b = xmax ** beta - a
        s = (a + b * r) ** (1. / beta)
    return s


[docs]def pdf(x, alpha, xmin, xmax=None):
    """
    Returns the PDF of a (bounded) power-law variable.

    Parameters
    ----------
    x : numpy.ndarray of 1-dimension
        array of values on which the power-law PDF is to be evaluated
    alpha : scalar, float, greater than 1
        power-law exponent
    xmin : scalar, float
        lower bound of the power-law random variable (greater than zero)
    xmax : scalar, float, optional
        upper bound of the power-law random variable

    Returns
    -------
    pdf : numpy.ndarray with ``shape = (len(x), )``
        array of the probability densities evaluated for each point in ``x``.

    See Also
    --------
    cdf
    sample

    """
    assert alpha > 1, "Power law exponent should be greater than 1!"
    beta = 1. - alpha
    if not xmax:
        k = - beta / xmin ** beta
        pdf = k * (x ** -alpha)
    else:
        k = xmax ** beta - xmin ** beta
        pdf = (beta / k) * (x ** -alpha)
    return pdf


[docs]def cdf(x, alpha, xmin, xmax=None):
    """
    Returns the CDF of a (bounded) power-law variable.

    Parameters
    ----------
    x : numpy.ndarray of 1-dimension
        array of values on which the power-law CDF is to be evaluated
    alpha : scalar, float, greater than 1
        power-law exponent
    xmin : scalar, float
        lower bound of the power-law random variable (greater than zero)
    xmax : scalar, float, optional
        upper bound of the power-law random variable

    Returns
    -------
    cdf : numpy.ndarray with ``shape = (len(x), )``
        array of the cumulative probabilities evaluated for each point in ``x``.

    See Also
    --------
    pdf
    sample

    """
    assert alpha > 1, "Power law exponent should be greater than 1!"
    beta = 1. - alpha
    if not xmax:
        k = 1.
        cdf = k - (x / xmin) ** beta
    else:
        k = xmax ** beta - xmin ** beta
        cdf = (x ** beta - xmin ** beta) / k
    return cdf


[docs]def histogram(arr, bins=None, plot=False, ax=None, **kwargs):
    """
    Plots a power law histogram using logarithmic binning.

    Parameters
    ----------
    arr : numpy.ndarray of 1-dimension
        sample of a power-law distributed random deviates
    bins : numpy.ndarray of 1-dimension, optional
        bins for which the histogram counts are obtained. This should be of
        ``shape = (len(arr) + 1, )``. If not provided, an array of
        logarithmically spaced bins is estimated using Sturges' formula [3]_.
    plot : boolean
        If ``True``, the histogram results are plotted on given axes or on a
        newly created ``matplotlib.axes`` object. Default ``= False``.
    ax : matplotlib.axes, optional
        axes on which the estimated histogram is to be plotted. If not
        provided, a standard ``matplotlib.axes`` object is created for plotting.

    Returns
    -------
    hist : numpy.ndarray of 1 dimension
        array containing the histogram counts/fractions for the given random
        sample. This is of length ``n``, where ``n`` is the number of bins
        given by Sturges formula.
    bins : numpy.ndarray of 1 dimension
        array containing the bin edges for the given random sample obtained by
        constructing ``n`` number of logarithmically spaced bins.
    ax : matplotlib.axes
        axes on which the histogram is plotted

    Notes
    -----
    The **kwargs** arguments are partly passed on to ``numpy.histogram`` (such
    as the ``density`` keyword argument), and to ``matplotlib.pyplot.plot``
    (such as the ``mec``, ``mfc``, and ``ms`` keywords), and finally to also
    specify axes labels and their fontsizes using ``xlab`` and ``fs`` keyword
    arguments. These could also, in principle, be done separately outside of
    this function by using the necessary methods of the ``ax`` object.

    See Also
    --------
    pdf
    sample

    References
    ----------
    .. [3]  Sturges, H. A. (1926).
            "The choice of a class interval".
            Journal of the American Statistical Association, 21(153), 65-66.
            http://www.tandfonline.com/doi/abs/10.1080/01621459.1926.10502161

    """
    # argument parsing
    density = False
    mec, mfc, ms = "none", "Black", 5
    xlab, fs = "Observable (Units)", 12
    for key in kwargs:
        exec("%s = kwargs[key]" % key)
    if not bins:
        bins = _logbins(arr)
    # histogram counts
    hist = np.histogram(arr, bins=bins, density=density)[0]

    # histogram plot
    if plot:
        # create axes if not given
        if not ax:
            fig = pl.figure(figsize=[8.5, 6.5])
            ax = fig.add_subplot(111)
        # plot histogram results
        mid_pts = 0.5 * (bins[:-1] + bins[1:])
        ax.plot(mid_pts, hist, "o",
                mec=mec, mfc=mfc, ms=ms)
        ax.set_xscale("log")
        ax.set_yscale("log")
        ax.grid(which="both", axis="both")
        ax.set_xlabel(xlab, fontsize=fs)
        ax.set_ylabel("Probability", fontsize=fs)

    # return histogram results and axis object for further plotting
    if plot:
        return hist, bins, ax
    else:
        return hist, bins


def _logbins(arr):
    """
    Returns an array of logarithmically spaced bins for given array.
    """
    n = len(arr)
    nbins = np.ceil(np.log2(n)) + 1         # Sturges formula
    arrmin = arr[arr != 0.].min()
    arrmax = arr.max()
    bins = np.logspace(np.log10(arrmin), np.log10(arrmax),
                       num=nbins, base=10)
    return bins