0: Review Concepts from STAT 414
Many of the concepts from STAT 414: Introduction to Probability Theory will carry over in this course. In this unit, some of the essential formulas and theorems are presented here for review.
Objectives
Upon completion of this lesson, you should be able to recall and use your prior knowledge to:
- Define key terms and theorems related to probability.
- Summarize the properties and applications of discrete random variables.
- Distinguish between continuous random variables and discrete random variables, providing examples and explaining their differences.
- Explain the formulas and definitions of jointly distributed random variables and apply them to solve problems involving joint probability distributions.
- Describe the concept of statistical inference and its role in making conclusions about populations based on sample data.
0.1: Probability
In this section, we review some of the important definitions and theorems for probability. It would be helpful to refresh your memory on these concepts and do some practice.
Def. 1 (Event) Let the outcome space (or sample space) be denoted as \(\mathbf{S}\). An event is subset of the sample space, Events are often denotes with capital letters, \(A,B, C, \ldots\). Therefore, if \(A\) is an event in \(\mathbf{S}\), then \(A\subset \mathbf{S}\).
Def. 2 (Probability) Probability is a logical framework for quantifying uncertainty or randomness in a princled way. It is primarily use to explain the uncertainty or randomness in obtaining the current data.
Def. 3 (Mutually Exclusive Events) Events \(A\) and \(B\) are called mutually exclusive (or disjoint) events if \(A\cap B=\emptyset\).
Def. 4 (Conditional Probability) The conditional probability of an event \(A\) given that an event \(B\) has occurred is: \[\begin{aligned} P(A|B)=\frac{P(A\cap B)}{P(B)} \end{aligned}\] as long as \(P(B)>0\).
Def. 5 (Independent Events) Events \(A\) and \(B\) are independent events if the occurrence of one of them does not affect the probability of the occurrence of the other. That is, two events are independent if either \[\begin{aligned} P(B|A)=P(B) \end{aligned}\] provided that \(P(A)>0\). Events \(A\) and \(B\) are independent events if and only if \[\begin{aligned} P(A\cap B)=P(A)P(B) \end{aligned}\] Otherwise, \(A\) and \(B\) are called dependent events.
0.2: Distributions
We will use distributions heavily in this course. It is important to review the definitions and expectations of random variables. We start with a brief review of discrete distributions and then onto continuous distributions.
Discrete Distributions
Def. 6 (Probability Mass Function) For a discrete random variable, \(X\), the probability mass function (pmf or PMF) is a function that satisfies the following properties:
\(f(x)=P(X=x)>0\), for all \(x\in\) the support \(S\)
\(\sum_{x\in S}f(x)=1\)
\(P(X\in A)=\sum_{x\in A}f(x)\), given event \(A\)
It is important to note that the pmf is the probability that \(X\) takes on a particular value, \(x\).
Def. 7 (Cumulative Distribution Function) For a discrete random variable, \(X\), the cumulative distribution function (cdf or CDF), denoted \(F(x)\) is defined as: \[\begin{aligned} F(x)=P(X\le x) \end{aligned}\]
The cdf of a random variable \(X\) has the following properties:
\(F_X(t)\) is a non-decreasing function of \(t\), for \(-\infty<t<\infty\).
\(F_X(t)\) ranges from 0 to 1.
If \(X\) is a discrete random variable whose minimum is \(a\), then \[\begin{aligned} F_X(a)=P(X\le a)=P(X=a)=f_X(a) \end{aligned}\] If \(c\) is less than \(a\), then \(F_X(c)=0\).
If the maximum value of \(X\) is \(b\), then \(F_X(b)=1\).
Also called the distribution function.
All probabilities concerning \(X\) can be stated in terms of \(F\).
Below is a list some of the named discrete distributions from the previous course. It is important to be able to recall the probability mass functions, expected values, and variances for these discrete distributions.
Bernoulli: (p) with \(0<p<1\) \[\begin{aligned} & f(x)=p^x(1-p)^{1-x}, \qquad x=0, 1\\ & E(X)=p, \qquad \text{Var}(X)=p(1-p) \end{aligned}\]
Binomial: (n, p) with \(0<p<1\) \[\begin{aligned} & f(x)={n\choose x}p^x(1-p)^{n-x}, \qquad x=0, 1, \ldots, n\\ & E(X)=np, \qquad \text{Var}(X)=np(1-p) \end{aligned}\]
Geometric: (p) with \(0<p<1\) \[\begin{aligned} & f(x)=p(1-p)^x, \qquad x=0, 1, 2, 3, \ldots\\ & E(X)=\frac{1}{p}, \qquad \text{Var}(X)=\frac{1-p}{p^2} \end{aligned}\]
Negative Binomial: (r, p) with \(0<p<1\) and \(r=1, 2, \ldots\) \[\begin{aligned} & f(x)={{x-1}\choose {r-1}}p^r(1-p)^{x-r}, \qquad x=r, r+1. r+2, \ldots\\ & E(X)=r\left(\frac{1}{p}\right), \qquad \text{Var}(X)=\frac{r(1-p)}{p^2} \end{aligned}\]
Hypergeometric: (n, \(N_1\), \(N_2\)) with \(N_1>0\), \(N_2>0\), \(N=N_1+N_2\), \(1\le n\le N_1+N_2\) \[\begin{aligned} & f(x)=\frac{{N_1\choose x} {N_2\choose{n-x}}}{{N\choose n}}, \qquad x\le n, \;\; x\le N_1, \;\; n-x\le N-2\\ & E(X)=n\left(\frac{N_1}{N}\right), \qquad \text{Var}(X)=n\left(\frac{N_1}{N}\right)\left(\frac{N_2}{N}\right)\left(\frac{N-n}{N-1}\right) \end{aligned}\]
Poisson: (\(\lambda\)) with \(\lambda>0\) \[\begin{aligned} & f(x)=\frac{e^{-\lambda}\lambda^x}{x!}, \qquad x=0, 1, 2, 3, \ldots\\ & E(X)=\lambda, \qquad \text{Var}(X)=\lambda \end{aligned}\]
Continuous Distributions
In this section, we present some of the named distributions for continuous random variables.
Def. 8 (Probability Density Function) For a continuous random variable, \(X\), the probability density function (pdf or PDF) is a function that satisfies the following properties:
\(f(x)>0\), for all \(x\in\) the support \(S\)
The area under the curve \(f(x)\) in the support is equal to 1. That is: \[\begin{aligned} \int_{x\in S}f(x)=1 \end{aligned}\]
If \(A\) is an interval, then the probability that \(x\) belongs to \(A\) is: \[\begin{aligned} P(X\in A)=\int_{A}f(x)dx \end{aligned}\]
It is important to note that unlike the probability mass function (pmf) of a discrete random variable, \(f(x)\ne P(X=x)\) for a continuous random variable, \(X\), In fact, \(P(X=x)=0\) if \(X\) is a continuous random variable.
Next, we will review the named continuous distributions. Again, it is import to recognize the probability density functions, expected values, and variances the these continuous distributions.
Beta: (\(\alpha, \beta\)), with \(\alpha>0\) and \(\beta>0\) \[\begin{aligned} &f(x)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1}, \qquad 0<x<1\\ & E(X)=\frac{\alpha}{\alpha+\beta}, \qquad \text{Var}(X)=\frac{\alpha\beta}{(\alpha+\beta+1)(\alpha+\beta)^2} \end{aligned}\]
Exponential: (\(\theta\)), with \(\theta>0\) \[\begin{aligned} &f(x)=\frac{1}{\theta}e^{-x/\theta}, \qquad 0<x\\ & E(X)=\theta, \qquad \text{Var}(X)=\theta^2 \end{aligned}\]
Chi-square: \(\chi^2\)(r), with \(r>0\) \[\begin{aligned} &f(x)=\frac{1}{\Gamma(r/2)2^{r/2}}x^{r/2-1}e^{-x/2}, \qquad 0<x\\ & E(X)=r, \qquad \text{Var}(X)=2r \end{aligned}\]
Gamma: (\(\alpha\), \(\theta\)) with \(\alpha>0\) and \(\theta>0\) \[\begin{aligned} & f(x)=\frac{1}{\Gamma(\alpha)\theta^\alpha} x^{\alpha-1}e^{-x/\theta}, \qquad x>0\\ & E(X)=\alpha \theta, \qquad \text{Var}(X)=\alpha\theta^2 \end{aligned}\]
Normal: (\(\mu\), \(\sigma^2\)) with \(\sigma>0\) \[\begin{aligned} & f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2\sigma^2}(x-\mu)^2}, \qquad -\infty<x<\infty\\ & E(X)=\mu, \qquad \text{Var}(X)=\sigma^2 \end{aligned}\]
Uniform: (\(a\), \(b\)) \[\begin{aligned} & f(x)=\frac{1}{b-a}, \qquad a<x<b\\ & E(X)=\frac{a+b}{2}, \qquad \text{Var}(X)=\frac{(b-a)^2}{12} \end{aligned}\]
0.3: Mathematical Expectations
You should be able to find the expected value and variance of a given random variable, as well as expected values of functions of random variables. Expectations will become particularly important when we learn about properties of estimators in the next few lessons.
Def. 9 (Mathematical Expectation for discrete random variable) If \(f(x)\) is the probability mass function (pmf) of the discrete random variable \(X\) with support \(S\), and if the summation \[\begin{aligned} \sum_{x\in S} u(x)f(x) \end{aligned}\] exists (that is, it is less than \(\infty\)), then the resulting sum is called the mathematical expectation or the expected value of the function \(u(X)\). The expectation is denoted \[\begin{aligned} E[u(X)]=\sum_{x\in S}u(x)f(x) \end{aligned}\]
Def. 10 (Mathematical Expectation for a continuous random variable) For a continuous random variable, \(X\), with probability distribution function (pdf), \(f(x)\), the expected value of a function of \(X\), \(u(X)\) is \[\begin{aligned} E(u(X))=\int_{-\infty}^\infty u(x)f(x)\;dx \end{aligned}\]
Special mathematical expectations:
- If \(u(x)=x\) then \(E(u(X))=\mu\)
- If \(u(x)=(x-\mu)^2\), then \(E(u(X))=\sigma^2=\text{Var}(X)\).
0.4: Bivariate Distributions
In this class, it will important to recall bivariate distributions, and be able to make the extension to multivariate distributions. A joint (bivariate) probability distribution describes the probability that a randomly selected object from the population has two characteristics of interest.
Recall the definition of a joint (bivariate) probability mass function.
Def. 11 (Joint Probability Mass Function) Let \(X\) and \(Y\) be two discrete random variabled, and let \(S\) denote the two-dimensional support of \(X\) and \(Y\). The function \(f(x,y)=P(X=x, Y=y)\) is a joint probability mass function (pmf) if it satisfies the following three conditions:
\(0\le f(x,y)\le 1\)
\(\sum_{(x,y)\in S}\sum f(x,y)=1\)
\(P[(X,Y)\in A]=\sum_{(x,y)\in A}f(x,y)\), where \(A\) is a subset of the support \(S\).
Def. 12 (Joint Probability Density Function) Let \(X\) and \(Y\) be two continuous random variables, and let \(S\) denote the two-dimensional support of \(X\) and \(Y\). Then, the function \(f(x,y)\) is a joint probability density function (pdf) if it satisfies the following three conditions:
\(f(x,y)>0\)
\(\int_{-\infty}^\infty \int_{-\infty}^\infty f(x,y)dxdy=1\)
\(P[(X,Y)\in A]=\int \int_A f(x,y)dxdy\)
where \(\{(X,Y)\in A\}\) is an event in the \(xy\)-plane.
Def. 13 (Marginal PMFs) Let \(X\) be a discrete random variable with support \(S_1\), and let \(Y\) be a discrete random variable with support \(S_2\). Let \(X\) and \(Y\) have a joint probability mass function \(f(x,y)\) with support \(S\). Then, the probability mass function of \(X\) alone, which is called the marginal probability mass function of \(X\) is defined by: \[\begin{aligned} f_X(x)=\sum_{y}f(x,y)=P(X=x), \qquad x\in S_1 \end{aligned}\] where, for each \(x\) in the support \(S_1\), the summation is taken over all possible values of \(y\).
Similarly, the probability mass function of \(Y\) alone, which is called the marginal probability mass function of \(Y\) is defined by: \[\begin{aligned} f_Y(y)=\sum_{x}f(x,y)=P(Y=y), \qquad y\in S_2 \end{aligned}\] where, for each \(y\) in the support \(S_2\), the summation is taken over all possible values of \(x\).
Def. 14 (Marginal PDFs) Let \(X\) be a continuous random variable with support \(S_1\), and let \(Y\) be a continuous random variable with support \(S_2\). Let \(X\) and \(Y\) have a joint probability denstiy function \(f(x,y)\) with support \(S\). Then, the probability density function of \(X\) alone, which is called the marginal probability density function of \(X\) is defined by: \[\begin{aligned} f_X(x)=\int_{y}f(x,y)dy, \qquad x\in S_1 \end{aligned}\]
Similarly, the probability density function of \(Y\) alone, which is called the marginal probability density function of \(Y\) is defined by: \[\begin{aligned} f_Y(y)=\int_{x}f(x,y)dx, \qquad y\in S_2 \end{aligned}\]
Suppose we wish to determine if two random variables are independent, given the joint probability distribution and the marginal distributions. The random variables, \(X\) and \(Y\) are independent if and only if \[\begin{aligned} f(x,y)=f_X(x)f_Y(y) \end{aligned}\] for all \(x \in S_1, \; y\in S_2\). Otherwise, \(X\) and \(Y\) are said to be dependent.
Conditional Distributions
It is helpful to review conditional distributions for this course. They will be particularly useful when we cover Bayesian statistics.
A conditional probability distribution is a probability distribution for a sub-population. That is, a conditional distribution describes the probability that a randomly selected object from a sub-population has the one characteristic of interest.
Def. 15 (Conditional probability mass function) The conditional probability mass function of \(X\), given \(Y=y\) is defined by: \[\begin{aligned} g(x|y)=\frac{f(x,y)}{f_Y(y)}, \qquad \text{provided }f_Y(y)>0 \end{aligned}\]
Distributions of Sums of Several Random Variables
In this class, we will need to find the distributions of functions of random variables and also of functions of several random variables. Being able to recall some of the helpful There are some helpful theorems for the sum of several random variables.
Random Functions Associated with Normal Distributions
There are many theorems for the previous statistics course that you may find useful to recall in this semester.
Def. 16 (t-distribution) If \(Z\sim N(0,1)\) and \(U\sim \chi^2(r)\) are independent, then the random variable \[\begin{aligned}
T=\frac{Z}{\sqrt{\frac{U}{r}}}
\end{aligned}\] follows at \(t\)-distribution with \(r\) degrees of freedom.
0.5: Review Examples
In this section, we present some examples to help you recall some of the information from the Introduction to Probability course.
Example 1 A woman who was initially thougth to have a 1% risk of cancer ended up with a positive mammogram result. A mammogram accurately classifies about 80% of cancerous tumors and 90% of benign tumors. What is the probability that the woman has cancer, given the positive result?
Solution Let the event \(C\) denote cancer and event \(P\) denote a positive result. We know: \(P(C)=0.01\) and thus \(P(C^\prime)=1-P(C)=1-0.01=0.99\). Also, \(P(P|C)=0.8\) and \(P(P^\prime|C^\prime)\), which implies \(P(P|C^\prime)=0.1\). Putting this all together, we get” \[P(C|P)=\frac{P(P|C)P(C)}{P(P|C)P(C)+P(P|C^\prime)P(C^\prime)}=\frac{0.8(0.01)}{0.8(0.01)+0.1(0.99)}=0.075\]
Example 2 When you put your money into the coffee machine at the Student Center, a paper cup comes down and some coffee is put into it. You are supposed to get 8 oz of coffee. However, the actual amount of coffee is dispensed is a normal random variable with mean equal to the machine setting and variance of 0.0625 oz.
What should the machine setting be so that, in the long run, only at 2% of the drinks will contain less than 8 oz?
\[\begin{align*} & P(X\le 8)=P\left(Z\le \frac{8-\mu}{\sqrt{0.0625}}\right)=0.02, \qquad \text{table value -2.0537}\\ & -2.0537= \frac{8-\mu}{\sqrt{0.0625}}\qquad \Rightarrow \mu=8.513425 \end{align*}\]
Example 3 Let the random variable \(X\) have the following pdf \[\begin{align*} f(w)=\begin{cases} cx^2, & -1\le x<0\\ cx, & 0\le x\le 2\\ 0, & \text{otherwise} \end{cases} \end{align*}\] where \(c\) is a constant.
Find the value \(c\) such that the above is a valid probability density function.
Find the mean of the random variable.
Find the cdf of \(X\).
Find the median of the random variable.
\(\begin{align*}& \frac{1}{c}=\int_{-1}^0x^2\;dx+\int_0^2x\; dx=\frac{1}{3}x^3\mid_{-1}^0+\frac{1}{2}x^2\mid_0^2=\frac{1}{3}+2=\frac{7}{3}, \qquad c=\frac{3}{7} \end{align*}\)
\(\begin{align*} E(X)=\int_{-1}^0\frac{3}{7}x^3\; dx+\int_0^2\frac{3}{7}x^2\; dx=\frac{3}{28}x^4\mid_{-1}^0+\frac{1}{7}x^3\mid_0^4=-\frac{3}{28}+\frac{8}{7}=\frac{29}{28} \end{align*}\)
\(\begin{align*} F(x)=\begin{cases} \int_{-1}^x\frac{3}{7}x^2dx=\frac{1}{7}x^3\mid_{-1}^x=\frac{1}{7}(x^3+1), & -1\le x<0\\ \frac{1}{7}+\int_0^x \frac{3}{7}xdx=\frac{1}{7}+\frac{3}{14}x^2, & 0\le x<2\\ 1, & x\ge 2 \end{cases} \end{align*}\)
\(\begin{align*} 0.5=\frac{1}{7}+\frac{3}{14}x^2, \qquad \Rightarrow \frac{5}{14}=\frac{3}{14}x^2, \qquad \Rightarrow x=\sqrt{\frac{5}{3}}\approx 1.29 \end{align*}\)
Example 4 The lifetime (in months) of two components of a machine have joint probability density function \[\begin{align*} f(x,y)=c(50-x-y), \qquad 0<x<50-y<50 \end{align*}\] and zero otherwise.
Find the constant, \(c\), such that \(f(x,y)\) is valid joint pdf. You may leave the constant as \(c\).
Find the marginal distributions.
Find the conditional distribution of \(X\) given \(Y=y\). If convenient, you may leave the constant as \(c\).
- \(\begin{align*}\
& \frac{1}{c}=\int_0^{50}\int_0^{50-y} 50-x-y\;dx\;dy=\int_0^{50} \left[50x-\frac{1}{2}x^2-yx\mid_0^{50-y}\right] dy\\
& =\int_0^{50} \frac{1}{2}y^2-50y+1250dy=\frac{1}{6}y^3-25y^2+1250y\mid_0^{50}=\frac{125000}{6}\\
& \Rightarrow c=\frac{6}{125000}
\end{align*}\)
2. \(\begin{align*}\
& f(x)=c\int_0^{50-x} 50-x-y\;dy=\frac{c}{2}(x-50)^2, \qquad 0\le x\le 50\\
& f(y)=c\int_0^{50-y} 50-x-y\;dx=\frac{c}{2}(y-50)^2, \qquad 0\le y\le 50
\end{align*}\)
3. \(\begin{align*}\
f(x|y)=\frac{f(x,y)}{f(y)}=\frac{c(50-x-y)}{\frac{c}{2}(y-50)^2}=\frac{2(50-x-y)}{(y-50)^2}, \qquad 0<x<50-y
\end{align*}\)
Example 5 Suppose the cdf for the discrete random variable \(X\) is \[\begin{aligned} F(x)=\frac{x(x+1)}{20}, \qquad x=1,2,3,4. \end{aligned}\]
Find the pdf of \(X\). You may put in table form.
Derive the \(30^{th}\) percentile of \(X\).
Derive the mean of \(X\).
Derive the standard deviation of \(X\).
Recall that the cdf is defined as \(F(x)=P(X\le x)\). Start with \(x=1\). We know \(F(1)=P(X\le 1)\). Since there are no values less than 1, it follows that \[\begin{aligned} & F(1)=P(X\le 1)=P(X=1)=\frac{1(2)}{20}=\frac{2}{20}\\ & f(1)=P(X=1)=\frac{1}{10} \end{aligned}\] Next, consider \(x=2\). \[\begin{aligned} & F(2)=P(X\le 2)=\frac{2(3)}{20}=\frac{3}{10}=0.3 \end{aligned}\] We also know that \[\begin{aligned} & F(2)=P(X\le 2)=P(X=1)+P(X=2) \end{aligned}\]
\[\begin{aligned} & 0.3=P(X=1)+P(X=2)\\ & \Rightarrow P(X=2)=0.3-P(X=1)=0.3-0.1=0.2 \end{aligned}\]Therefore,Thus, \(f(2)=P(X=2)=0.2\). We can continue on in the same manner to find the pmf for the other values of \(x\). The tabular form of the pmf of \(X\) is:
\(x\) 1 2 3 4 \(f(x)\) 0.1 0.2 0.3 0.4 Derive the \(30^{th}\) percentile of \(X\). \[\begin{aligned} & 0.3=\frac{x(x+1)}{20}\Rightarrow 6=x^2+x\Rightarrow x^2+x-6=0\\ & =(x+3)(x-2)=0\Rightarrow x=2 \end{aligned}\]
Derive the mean of \(X\). \[\begin{aligned} E(X)=0\left(0\right)+1\left(\frac{2}{20}\right)+2\left(\frac{4}{20}\right)+3\left(\frac{6}{20}\right)+4\left(\frac{8}{20}\right)=\frac{60}{20}=3 \end{aligned}\]
Derive the standard deviation of \(X\). \[\begin{aligned} & E(X^2)=0\left(0\right)+1\left(\frac{2}{20}\right)+4\left(\frac{4}{20}\right)+9\left(\frac{6}{20}\right)+16\left(\frac{8}{20}\right)=\frac{200}{20}=10\\ & \text{Var}(X)=E(X^2)-E(X)^2=10-3^2=10-9=1\\ & \text{SD}(X)=\sqrt{1}=1 \end{aligned}\]
0.6: Outline of STAT 415
Suppose that we obtain data \(x\) from a statistical model with an unknown parameter \(\theta\). %Imagine that \(\theta\) indexes a family of possible distributions for \(x\), and that the true value of \(\theta\) determines the true data-generating process. We will then consider questions such as:
Point Estimation: What is a good estimator for \(\theta\)? This depends on the definition of `good’, and we will introduce several criteria by which estimators can be judged. Just providing a point estimator \(\hat{\theta}\), without any sense of its uncertainty, is usually unsatisfying. Thus, statistical inference emphasizes accompanying point estimators with information about their uncertainties, e.g., we may be able to describe the distribution of \(\hat{\theta}\) or at least say what its standard deviation is. The standard deviation of an estimator is called its standard error.
Interval Estimation: Intuitively, much more informative than just saying something like ``I estimate \(\theta\) as 2.5’’ is to provide an interval, such as saying
`I am 95% confident that \(\theta\) is inclusively between 2.3 and 2.8.’
But what does `confident’ mean? In this course we will define precisely what it means to give an interval estimate, and study ways of constructing such estimates.
If \(\theta\) is a constant, then it either is or isn’t in the interval \([2.3, 2.8]\), so what does the 95% mean?
Hypothesis Testing (model evaluation): In many applications in the physical, biological, and social sciences, a researcher is interested in testing a hypothesis which can be expressed in the form \(\theta=\theta_0\) or, more generally, as \(\theta\in H_0\) for some set \(H_0\). Hypothesis testing is closely related to interval estimation (and arguably the latter is more useful), and can be approached via various perspectives.