Binomial distribution

 

Binomial distribution

From Wikipedia, the free encyclopedia
Binomial distribution
Probability mass function
Probability mass function for the binomial distribution
Cumulative distribution function
Cumulative distribution function for the binomial distribution
Notation
Parameters – number of trials
 – success probability for each trial
Support – number of successes
PMF
CDF (the regularized incomplete beta function)
Mean
Median or 
Mode or 
Variance
Skewness
Ex. kurtosis
Entropy
in shannons. For nats, use the natural log in the log.
MGF
CF
PGF
Fisher information
(for fixed )
Binomial distribution for 
with n and k as in Pascal's triangle

The probability that a ball in a Galton box with 8 layers (n = 8) ends up in the central bin (k = 4) is .

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcomesuccess (with probability p) or failure (with probability ). A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.[1]

The binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one. However, for N much larger than n, the binomial distribution remains a good approximation, and is widely used.

Definitions[edit]

Probability mass function[edit]

In general, if the random variable X follows the binomial distribution with parameters n   and p ∈ [0,1], we write X ~ B(np). The probability of getting exactly k successes in n independent Bernoulli trials is given by the probability mass function:

for k = 0, 1, 2, ..., n, where

is the binomial coefficient, hence the name of the distribution. The formula can be understood as follows: k successes occur with probability pk and n − k failures occur with probability . However, the k successes can occur anywhere among the n trials, and there are  different ways of distributing k successes in a sequence of n trials.

In creating reference tables for binomial distribution probability, usually the table is filled in up to n/2 values. This is because for k > n/2, the probability can be calculated by its complement as

Looking at the expression f(knp) as a function of k, there is a k value that maximizes it. This k value can be found by calculating

and comparing it to 1. There is always an integer M that satisfies[2]

f(knp) is monotone increasing for k < M and monotone decreasing for k > M, with the exception of the case where (n + 1)p is an integer. In this case, there are two values for which f is maximal: (n + 1)p and (n + 1)p − 1. M is the most probable outcome (that is, the most likely, although this can still be unlikely overall) of the Bernoulli trials and is called the mode.

Example[edit]

Suppose a biased coin comes up heads with probability 0.3 when tossed. The probability of seeing exactly 4 heads in 6 tosses is

Cumulative distribution function[edit]

The cumulative distribution function can be expressed as:

where  is the "floor" under k, i.e. the greatest integer less than or equal to k.

It can also be represented in terms of the regularized incomplete beta function, as follows:[3]

which is equivalent to the cumulative distribution function of the F-distribution:[4]

Some closed-form bounds for the cumulative distribution function are given below.

Properties[edit]

Expected value and variance[edit]

If X ~ B(np), that is, X is a binomially distributed random variable, n being the total number of experiments and p the probability of each experiment yielding a successful result, then the expected value of X is:[5]

This follows from the linearity of the expected value along with the fact that X is the sum of n identical Bernoulli random variables, each with expected value p. In other words, if  are identical (and independent) Bernoulli random variables with parameter p, then  and

The variance is:

This similarly follows from the fact that the variance of a sum of independent random variables is the sum of the variances.

Higher moments[edit]

The first 6 central moments, defined as , are given by

The non-central moments satisfy

and in general [6] [7]

where  are the Stirling numbers of the second kind, and  is the th falling power of . A simple bound [8] follows by bounding the Binomial moments via the higher Poisson moments:

This shows that if , then  is at most a constant factor away from 

Mode[edit]

Usually the mode of a binomial B(n, p) distribution is equal to , where  is the floor function. However, when (n + 1)p is an integer and p is neither 0 nor 1, then the distribution has two modes: (n + 1)p and (n + 1)p − 1. When p is equal to 0 or 1, the mode will be 0 and n correspondingly. These cases can be summarized as follows:

Proof: Let

For  only  has a nonzero value with . For  we find  and  for . This proves that the mode is 0 for  and  for .

Let . We find

.

From this follows

So when  is an integer, then  and  is a mode. In the case that , then only  is a mode.[9]

Comments

Popular posts from this blog

Generation methods

F-TEST

NORMED VECTOR SPACES AND INNER PRODUCT SPACES