Table of contents

Background
- Functions describing a probability distribution
- Sampling
List of probability distributions
- Exponential
- Logistic
- LogLogistic
- LogNormal
- Normal
- Uniform
- Weibull

Background

Functions describing a probability distribution

There exist multiple ways to describe a probability distribution of a univariate random variable $X$. In machine learning, we usually work with the following two functions:

Probability density function (PDF) $$p(x) = \Pr(X \in [x, x + dt))$$
Cumulative distribution function (CDF)

$$ \begin{aligned} F(x) &= \Pr(X \le x)\\ &= \int_{-\infty}^{x} p(u) du \end{aligned} $$

However, there also exist other options that can be convenient in practice. For example, when working with temporal point processes or survival analysis, we often prefer the following two functions:

Survival function (SF)

$$ \begin{aligned} S(x) &= \Pr(X \ge x)\\ &= 1 - F(x) \end{aligned} $$

Hazard function (a.k.a., failure rate or intensity)

$$ \begin{aligned} h(x) &= \Pr(X \in [x, x + dt) \mid X \ge x)\\ &= \frac{p(x)}{S(x)} \end{aligned} $$

Each of these four functions uniquely defines the distribution of $X$. Therefore, if we specify either $p(x)$, $F(x)$, $S(x)$ or $h(x)$, the other 3 functions are immediately defined as well.

Sampling

The inverse survival function $S^{-1}$ provides us with a simple way to generate samples of $X$ using uniform noise:

$$ \begin{aligned} u &\sim \operatorname{Uniform}([0, 1])\\ x &= S^{-1}(u) \end{aligned} $$

This is equivalent to the inverse transform sampling.

The above procedure generates a sample from the entire support of the distribution (e.g., between 0 and $\infty$). However, we can easily adapt it to only draw samples from an interval $[x_{\text{min}}, x_{\text{max}}]$.

$$ \begin{aligned} a &= S(x_{\text{max}})\\ b &= S(x_{\text{min}})\\ u &\sim \operatorname{Uniform}([a, b])\\ x &= S^{-1}(u) \end{aligned} $$

Here is an example where this can be useful. Suppose $X$ corresponds to inter-arrival time between events (e.g., failures of some machine in a factory). If we know that no failure occurred in the last 50 days, we can use the above procedure to draw samples conditioned on this fact by setting $x_{\text{min}} = 50$.

List of probability distributions

Exponential

Parameters
- rate $\lambda > 0$
Support: $(0, \infty)$
PDF $$p(x) = \lambda \exp(- \lambda x)$$
CDF $$F(x) = 1 - \exp(-\lambda x)$$
SF $$S(x) = \exp(-\lambda x)$$
Inverse SF $$S^{-1}(u) = -\frac{1}{\lambda} \log (u)$$

Logistic

Parameters
- location $\mu$
- scale $s > 0$
Support: $\mathbb{R}$
PDF $$p(x) = \frac{\exp\left(\frac{x - \mu}{s}\right)}{s \cdot \left(1 + \exp\left(\frac{x - \mu}{s}\right)\right)^2}$$
CDF $$F(x) = \frac{1}{1 + \exp\left(-\frac{x - \mu}{s}\right)}$$
SF $$S(x) = \frac{1}{1 + \exp\left(\frac{x - \mu}{s}\right)}$$
Inverse SF $$S^{-1}(u) = s \cdot \log\left(\frac{1-u}{u}\right) + \mu$$

LogLogistic

Parameters
- location $\mu$
- scale $s > 0$
Support: $(0, \infty)$
PDF $$p(x) = \frac{\exp\left(\frac{\log(x) - \mu}{s}\right)}{x \cdot s \cdot \left(1 + \exp\left(\frac{\log(x) - \mu}{s}\right)\right)^2}$$
CDF $$F(x) = \frac{1}{1 + \exp\left(-\frac{\log(x) - \mu}{s}\right)}$$
SF $$S(x) = \frac{1}{1 + \exp\left(\frac{\log(x) - \mu}{s}\right)}$$
Inverse SF $$S^{-1}(u) = \exp\left(s \cdot \log\left(\frac{1-u}{u}\right) + \mu\right)$$

LogNormal

Parameters
- location $\mu$
- scale $s > 0$
Support $(0, \infty)$
PDF $$p(x) = \frac{1}{x s\sqrt{2\pi}}\exp\left(-\frac{(\log (x) - \mu)^2}{2s^2}\right)$$
CDF $$F(x) = \Phi\left(\frac{\log(x)-\mu}{s}\right)$$ where $\Phi$ is the CDF of the standard normal distribution.
SF $$S(x) = \Phi\left(-\frac{\log(x)-\mu}{s}\right)$$
Inverse SF $$S^{-1}(u) = \exp\left(s \cdot \Phi^{-1}(1 - u) + \mu\right)$$

Normal

Parameters
- location $\mu$
- scale $s > 0$
Support $\mathbb{R}$
PDF $$p(x) = \frac{1}{s\sqrt{2\pi}}\exp\left(-\frac{(x - \mu)^2}{2s^2}\right)$$
CDF $$F(x) = \Phi\left(\frac{x-\mu}{s}\right)$$ where $\Phi$ is the CDF of the standard normal distribution.
SF $$S(x) = \Phi\left(1 - \frac{x-\mu}{s}\right)$$
Inverse SF $$S^{-1}(u) = s \cdot \Phi^{-1}(1 - u) + \mu$$

Uniform

Parameters
- lower boundary $a$
- upper boundary $b > a$
Support: $(a, b)$
PDF $$p(x) = \frac{1}{b - a}$$
CDF $$F(x) = \frac{x - a}{b - a}$$
SF $$S(x) = \frac{b - x}{b - a}$$
Inverse SF $$S^{-1}(u) = b - u \cdot (b - a)$$

Weibull

Parameters
- rate $b > 0$
- concentration $k > 0$
Support: $(0, \infty)$
PDF $$p(x) = b k x^{k-1} \exp(-bx^k)$$
CDF $$F(x) = 1 - \exp(-bx^k)$$
SF $$S(x) = \exp(-bx^k)$$
Inverse SF $$S^{-1}(u) = \left(-\frac{1}{b} \log (u)\right)^{\frac{1}{k}}$$

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DISTRIBUTIONS.md

DISTRIBUTIONS.md

Background

Functions describing a probability distribution

Sampling

List of probability distributions

Exponential

Logistic

LogLogistic

LogNormal

Normal

Uniform

Weibull

Files

DISTRIBUTIONS.md

Latest commit

History

DISTRIBUTIONS.md

File metadata and controls

Background

Functions describing a probability distribution

Sampling

List of probability distributions

Exponential

Logistic

LogLogistic

LogNormal

Normal

Uniform

Weibull