0: Review Concepts from STAT 414

Distributions

Bayes'

Probability

Central Limit Theorem

Many of the concepts from STAT 414: Introduction to Probability Theory will carry over in this course. In this unit, some of the essential formulas and theorems are presented here for review.

Objectives

Upon completion of this lesson, you should be able to recall and use your prior knowledge to:

Define key terms and theorems related to probability.
Summarize the properties and applications of discrete random variables.
Distinguish between continuous random variables and discrete random variables, providing examples and explaining their differences.
Explain the formulas and definitions of jointly distributed random variables and apply them to solve problems involving joint probability distributions.
Describe the concept of statistical inference and its role in making conclusions about populations based on sample data.

0.1: Probability

In this section, we review some of the important definitions and theorems for probability. It would be helpful to refresh your memory on these concepts and do some practice.

Def. 1 (Event) Let the outcome space (or sample space) be denoted as $\mathbf{S}$. An event is subset of the sample space, Events are often denotes with capital letters, $A,B, C, \ldots$. Therefore, if $A$ is an event in $\mathbf{S}$, then $A\subset \mathbf{S}$.

Def. 2 (Probability) Probability is a logical framework for quantifying uncertainty or randomness in a princled way. It is primarily use to explain the uncertainty or randomness in obtaining the current data.

Def. 3 (Mutually Exclusive Events) Events $A$ and $B$ are called mutually exclusive (or disjoint) events if $A\cap B=\emptyset$.

Def. 4 (Conditional Probability) The conditional probability of an event $A$ given that an event $B$ has occurred is: \[\begin{aligned} P(A|B)=\frac{P(A\cap B)}{P(B)} \end{aligned}\] as long as $P(B)>0$.

Def. 5 (Independent Events) Events $A$ and $B$ are independent events if the occurrence of one of them does not affect the probability of the occurrence of the other. That is, two events are independent if either \[\begin{aligned} P(B|A)=P(B) \end{aligned}\] provided that $P(A)>0$. Events $A$ and $B$ are independent events if and only if \[\begin{aligned} P(A\cap B)=P(A)P(B) \end{aligned}\] Otherwise, $A$ and $B$ are called dependent events.

Theorem 1 (Bayes’ Theorem) Let the $m$ events $B_1, B_2, \ldots, B_m$ constitute a partition of the sample space $S$. That is, the $B_i$ are mutually exclusive: \[\begin{aligned} B_i\cap B_j=\emptyset \text{ for }i\ne j \end{aligned}\] and exhaustive: \[\begin{aligned} S=B_1\cup B_2\cup B_2\cup \ldots \cup B_m \end{aligned}\] Also, suppose the prior probability of the vent $B_i$ is positive, that is $P(B_i)>0$ for $i=1, \ldots,m$. Now if $A$ is an event, then $A$ can be written as the union of $m$ mutually exclusive events, namely: \[\begin{aligned} A=(A\cap B_1)\cup (A\cap B_2)\cup \ldots \cup (A\cap B_m) \end{aligned}\] Therefore, \[\begin{aligned} P(A) & = P(A\cap B_1)+P(A\cap B_2)+\ldots+P(A\cap B_m)\\ & = \sum_{i=1}^m P(A\cap B_i)=\sum_{i=1}^m P(B_i)P(A|B_i) \end{aligned}\] And so, as long as $P(A)>0$, the posterior probability of event $B_k$ given event $A$ has occurred is: \[\begin{aligned} P(B_k|A)=\frac{P(B_k)P(A|B_k)}{\sum_{i=1}^m P(B_i)P(A|B_i)} \end{aligned}\]

0.2: Distributions

We will use distributions heavily in this course. It is important to review the definitions and expectations of random variables. We start with a brief review of discrete distributions and then onto continuous distributions.

Discrete Distributions

Def. 6 (Probability Mass Function) For a discrete random variable, $X$, the probability mass function (pmf or PMF) is a function that satisfies the following properties:

$f(x)=P(X=x)>0$, for all $x\in$ the support $S$
$\sum_{x\in S}f(x)=1$
$P(X\in A)=\sum_{x\in A}f(x)$, given event $A$

It is important to note that the pmf is the probability that $X$ takes on a particular value, $x$.

Def. 7 (Cumulative Distribution Function) For a discrete random variable, $X$, the cumulative distribution function (cdf or CDF), denoted $F(x)$ is defined as: \[\begin{aligned} F(x)=P(X\le x) \end{aligned}\]

The cdf of a random variable $X$ has the following properties:

$F_X(t)$ is a non-decreasing function of $t$, for $-\infty<t<\infty$.
$F_X(t)$ ranges from 0 to 1.
If $X$ is a discrete random variable whose minimum is $a$, then \[\begin{aligned} F_X(a)=P(X\le a)=P(X=a)=f_X(a) \end{aligned}\] If $c$ is less than $a$, then $F_X(c)=0$.
If the maximum value of $X$ is $b$, then $F_X(b)=1$.
Also called the distribution function.
All probabilities concerning $X$ can be stated in terms of $F$.

Below is a list some of the named discrete distributions from the previous course. It is important to be able to recall the probability mass functions, expected values, and variances for these discrete distributions.

Bernoulli: (p) with $0<p<1$ \[\begin{aligned} & f(x)=p^x(1-p)^{1-x}, \qquad x=0, 1\\ & E(X)=p, \qquad \text{Var}(X)=p(1-p) \end{aligned}\]

Note!
The Bernoullu distribution is a special case of the Binomial distribution with $n=1$.

Binomial: (n, p) with $0<p<1$ \[\begin{aligned} & f(x)={n\choose x}p^x(1-p)^{n-x}, \qquad x=0, 1, \ldots, n\\ & E(X)=np, \qquad \text{Var}(X)=np(1-p) \end{aligned}\]

Geometric: (p) with $0<p<1$ \[\begin{aligned} & f(x)=p(1-p)^x, \qquad x=0, 1, 2, 3, \ldots\\ & E(X)=\frac{1}{p}, \qquad \text{Var}(X)=\frac{1-p}{p^2} \end{aligned}\]

Negative Binomial: (r, p) with $0<p<1$ and $r=1, 2, \ldots$ \[\begin{aligned} & f(x)={{x-1}\choose {r-1}}p^r(1-p)^{x-r}, \qquad x=r, r+1. r+2, \ldots\\ & E(X)=r\left(\frac{1}{p}\right), \qquad \text{Var}(X)=\frac{r(1-p)}{p^2} \end{aligned}\]

Note!
The Geometric distribution is a special case of the Negative Binomial distribution with $r=1$.

Hypergeometric: (n, $N_1$, $N_2$) with $N_1>0$, $N_2>0$, $N=N_1+N_2$, $1\le n\le N_1+N_2$ \[\begin{aligned} & f(x)=\frac{{N_1\choose x} {N_2\choose{n-x}}}{{N\choose n}}, \qquad x\le n, \;\; x\le N_1, \;\; n-x\le N-2\\ & E(X)=n\left(\frac{N_1}{N}\right), \qquad \text{Var}(X)=n\left(\frac{N_1}{N}\right)\left(\frac{N_2}{N}\right)\left(\frac{N-n}{N-1}\right) \end{aligned}\]

Poisson: ($\lambda$) with $\lambda>0$ \[\begin{aligned} & f(x)=\frac{e^{-\lambda}\lambda^x}{x!}, \qquad x=0, 1, 2, 3, \ldots\\ & E(X)=\lambda, \qquad \text{Var}(X)=\lambda \end{aligned}\]

Continuous Distributions

In this section, we present some of the named distributions for continuous random variables.

Def. 8 (Probability Density Function) For a continuous random variable, $X$, the probability density function (pdf or PDF) is a function that satisfies the following properties:

$f(x)>0$, for all $x\in$ the support $S$
The area under the curve $f(x)$ in the support is equal to 1. That is: \[\begin{aligned} \int_{x\in S}f(x)=1 \end{aligned}\]
If $A$ is an interval, then the probability that $x$ belongs to $A$ is: \[\begin{aligned} P(X\in A)=\int_{A}f(x)dx \end{aligned}\]

It is important to note that unlike the probability mass function (pmf) of a discrete random variable, $f(x)\ne P(X=x)$ for a continuous random variable, $X$, In fact, $P(X=x)=0$ if $X$ is a continuous random variable.

Next, we will review the named continuous distributions. Again, it is import to recognize the probability density functions, expected values, and variances the these continuous distributions.

Beta: ($\alpha, \beta$), with $\alpha>0$ and $\beta>0$ \[\begin{aligned} &f(x)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1}, \qquad 0<x<1\\ & E(X)=\frac{\alpha}{\alpha+\beta}, \qquad \text{Var}(X)=\frac{\alpha\beta}{(\alpha+\beta+1)(\alpha+\beta)^2} \end{aligned}\]

Exponential: ($\theta$), with $\theta>0$ \[\begin{aligned} &f(x)=\frac{1}{\theta}e^{-x/\theta}, \qquad 0<x\\ & E(X)=\theta, \qquad \text{Var}(X)=\theta^2 \end{aligned}\]

Note!
The exponential distribution is a special case of the Gamma distribution with $\alpha=1$.

Chi-square: $\chi^2$(r), with $r>0$ \[\begin{aligned} &f(x)=\frac{1}{\Gamma(r/2)2^{r/2}}x^{r/2-1}e^{-x/2}, \qquad 0<x\\ & E(X)=r, \qquad \text{Var}(X)=2r \end{aligned}\]

Note! The chi-square distribution is a special case of the Gamma distribution with $\alpha=r/2$ and $\theta=2$.

Gamma: ($\alpha$, $\theta$) with $\alpha>0$ and $\theta>0$ \[\begin{aligned} & f(x)=\frac{1}{\Gamma(\alpha)\theta^\alpha} x^{\alpha-1}e^{-x/\theta}, \qquad x>0\\ & E(X)=\alpha \theta, \qquad \text{Var}(X)=\alpha\theta^2 \end{aligned}\]

Normal: ($\mu$, $\sigma^2$) with $\sigma>0$ \[\begin{aligned} & f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2\sigma^2}(x-\mu)^2}, \qquad -\infty<x<\infty\\ & E(X)=\mu, \qquad \text{Var}(X)=\sigma^2 \end{aligned}\]

Uniform: ($a$, $b$) \[\begin{aligned} & f(x)=\frac{1}{b-a}, \qquad a<x<b\\ & E(X)=\frac{a+b}{2}, \qquad \text{Var}(X)=\frac{(b-a)^2}{12} \end{aligned}\]

0.3: Mathematical Expectations

You should be able to find the expected value and variance of a given random variable, as well as expected values of functions of random variables. Expectations will become particularly important when we learn about properties of estimators in the next few lessons.

Def. 9 (Mathematical Expectation for discrete random variable) If $f(x)$ is the probability mass function (pmf) of the discrete random variable $X$ with support $S$, and if the summation \[\begin{aligned} \sum_{x\in S} u(x)f(x) \end{aligned}\] exists (that is, it is less than $\infty$), then the resulting sum is called the mathematical expectation or the expected value of the function $u(X)$. The expectation is denoted \[\begin{aligned} E[u(X)]=\sum_{x\in S}u(x)f(x) \end{aligned}\]

Def. 10 (Mathematical Expectation for a continuous random variable) For a continuous random variable, $X$, with probability distribution function (pdf), $f(x)$, the expected value of a function of $X$, $u(X)$ is \[\begin{aligned} E(u(X))=\int_{-\infty}^\infty u(x)f(x)\;dx \end{aligned}\]

Special mathematical expectations:

If $u(x)=x$ then $E(u(X))=\mu$
If $u(x)=(x-\mu)^2$, then $E(u(X))=\sigma^2=\text{Var}(X)$.

Note:
Recall: $\text{Var}(X)=E(X^2)-E(X)^2$ and therefore $E(X^2)=\sigma^2+\mu^2$.

0.4: Bivariate Distributions

In this class, it will important to recall bivariate distributions, and be able to make the extension to multivariate distributions. A joint (bivariate) probability distribution describes the probability that a randomly selected object from the population has two characteristics of interest.

Recall the definition of a joint (bivariate) probability mass function.

Def. 11 (Joint Probability Mass Function) Let $X$ and $Y$ be two discrete random variabled, and let $S$ denote the two-dimensional support of $X$ and $Y$. The function $f(x,y)=P(X=x, Y=y)$ is a joint probability mass function (pmf) if it satisfies the following three conditions:

$0\le f(x,y)\le 1$
$\sum_{(x,y)\in S}\sum f(x,y)=1$
$P[(X,Y)\in A]=\sum_{(x,y)\in A}f(x,y)$, where $A$ is a subset of the support $S$.

Def. 12 (Joint Probability Density Function) Let $X$ and $Y$ be two continuous random variables, and let $S$ denote the two-dimensional support of $X$ and $Y$. Then, the function $f(x,y)$ is a joint probability density function (pdf) if it satisfies the following three conditions:

$f(x,y)>0$
$\int_{-\infty}^\infty \int_{-\infty}^\infty f(x,y)dxdy=1$
$P[(X,Y)\in A]=\int \int_A f(x,y)dxdy$

where $\{(X,Y)\in A\}$ is an event in the $xy$-plane.

Def. 13 (Marginal PMFs) Let $X$ be a discrete random variable with support $S_1$, and let $Y$ be a discrete random variable with support $S_2$. Let $X$ and $Y$ have a joint probability mass function $f(x,y)$ with support $S$. Then, the probability mass function of $X$ alone, which is called the marginal probability mass function of $X$ is defined by: \[\begin{aligned} f_X(x)=\sum_{y}f(x,y)=P(X=x), \qquad x\in S_1 \end{aligned}\] where, for each $x$ in the support $S_1$, the summation is taken over all possible values of $y$.

Similarly, the probability mass function of $Y$ alone, which is called the marginal probability mass function of $Y$ is defined by: \[\begin{aligned} f_Y(y)=\sum_{x}f(x,y)=P(Y=y), \qquad y\in S_2 \end{aligned}\] where, for each $y$ in the support $S_2$, the summation is taken over all possible values of $x$.

Def. 14 (Marginal PDFs) Let $X$ be a continuous random variable with support $S_1$, and let $Y$ be a continuous random variable with support $S_2$. Let $X$ and $Y$ have a joint probability denstiy function $f(x,y)$ with support $S$. Then, the probability density function of $X$ alone, which is called the marginal probability density function of $X$ is defined by: \[\begin{aligned} f_X(x)=\int_{y}f(x,y)dy, \qquad x\in S_1 \end{aligned}\]

Similarly, the probability density function of $Y$ alone, which is called the marginal probability density function of $Y$ is defined by: \[\begin{aligned} f_Y(y)=\int_{x}f(x,y)dx, \qquad y\in S_2 \end{aligned}\]

Suppose we wish to determine if two random variables are independent, given the joint probability distribution and the marginal distributions. The random variables, $X$ and $Y$ are independent if and only if \[\begin{aligned} f(x,y)=f_X(x)f_Y(y) \end{aligned}\] for all $x \in S_1, \; y\in S_2$. Otherwise, $X$ and $Y$ are said to be dependent.

Conditional Distributions

It is helpful to review conditional distributions for this course. They will be particularly useful when we cover Bayesian statistics.

A conditional probability distribution is a probability distribution for a sub-population. That is, a conditional distribution describes the probability that a randomly selected object from a sub-population has the one characteristic of interest.

Def. 15 (Conditional probability mass function) The conditional probability mass function of $X$, given $Y=y$ is defined by: \[\begin{aligned} g(x|y)=\frac{f(x,y)}{f_Y(y)}, \qquad \text{provided }f_Y(y)>0 \end{aligned}\]

Distributions of Sums of Several Random Variables

In this class, we will need to find the distributions of functions of random variables and also of functions of several random variables. Being able to recall some of the helpful There are some helpful theorems for the sum of several random variables.

Theorem 2 Let $X_1, X_2, \ldots, X_n$ be a random sample from a Bernoulli distribution with parameter p.
Then $Y=\sum_{i=1}^n X_i\sim\text{Bin}(n, p)$.

Theorem 3 Let $X_1, X_2, \ldots, X_n$ be a random sample from a Binomial distribution with parameters $n_i$ and p, for $i=1, 2, \ldots, n$.
Then $Y=\sum_{i=1}^n X_i\sim \text{Bin}(\sum n_i, p)$.

Theorem 4 Let $X_1,X_2,\ldots,X_n$ be a random sample from a Poisson distribution with parameter $\lambda_i$, for $i=1,2,\ldots,n$.
Then $Y=\sum_{i=1}^n X_i\sim \text{Pois}(\sum \lambda_i)$.

Theorem 5 Let $X_1,X_2,\ldots,X_n$ be a random sample from an Exponential distribution with parameter $\theta$, for $i=1,2,\ldots,n$.
Then $Y=\sum_{i=1}^n X_i\sim \Gamma(\alpha=n, \theta)$.

Theorem 6 Let $X_1, X_2, \ldots, X_n$ be a random sample from a Gamma distribution with parameters $\alpha_i$ and $\theta$, for $i=1,2,\ldots,n$.
Then $Y=\sum_{i=1}^n X_i\sim \Gamma(\sum_{i=1}^n \alpha_i, \theta)$.

Random Functions Associated with Normal Distributions

There are many theorems for the previous statistics course that you may find useful to recall in this semester.

Theorem 7 If $X_1, X_2, \ldots, X_n$ are mutually independent normal random variables with means $\mu_1, \mu_2, \ldots, \mu_n$ and variances $\sigma^2_1, \sigma^2_2, \ldots, \sigma^2_n$, respectively, then the linear combination: \[\begin{aligned} Y=\sum_{i=1}^n c_iX_i \end{aligned}\] follows a Normal distribution: \[\begin{aligned} N\left(\sum_{i=1}^n c_i\mu, \sum_{i=1}^n c_i^2\sigma^2_i\right) \end{aligned}\]

Corollary 1 If $X_1, X_2, \ldots, X_n$ are observations of a random sample of size $n$ from a $N(\mu, \sigma^2)$ population, then the sample mean: \[\begin{aligned} \bar{X}=\frac{1}{n}\sum_{i=1}^n X_i \end{aligned}\] is normally distributed with mean $\mu$ and variance $\frac{\sigma^2}{n}$.

Theorem 8 Let $X_1, X_2, \ldots, X_n$ be observations of a random sample of size $n$ from the Normal distribution $N(\mu, \sigma^2)$. Let $\bar{X}=\frac{1}{n}\sum_{i=1}^n X_i$ and $S^2=\frac{1}{n_1}\sum_{i=1}^n (X_i-\bar{X})^2$ be the sample mean and sample variance, respectively. Then, $\bar{X}$ and $S^2$ are independent. Also, \[\begin{aligned} \frac{(n-1)S^2}{\sigma^2} \end{aligned}\] follows a Chi-square distribution with $n-1$ degrees of freedom.

Def. 16 (t-distribution) If $Z\sim N(0,1)$ and $U\sim \chi^2(r)$ are independent, then the random variable \[\begin{aligned} T=\frac{Z}{\sqrt{\frac{U}{r}}} \end{aligned}\] follows at $t$-distribution with $r$ degrees of freedom.

Theorem 9 (Central Limit Theorem) Let $X_1, X_2, \ldots, X_n$ be a random sample from a distribution with finite mean $\mu$ and finite variance $\sigma^2$. If the sample size, $n$, is sufficiently large, then:

the sample mean $\bar{X}$ follows an approximate normal distribution
with mean $\mu_{\bar{X}}=\mu$
and variance $\sigma^2_{\bar{X}}=\frac{\sigma^2}{n}$.

0.5: Review Examples

In this section, we present some examples to help you recall some of the information from the Introduction to Probability course.

Example 1 A woman who was initially thougth to have a 1% risk of cancer ended up with a positive mammogram result. A mammogram accurately classifies about 80% of cancerous tumors and 90% of benign tumors. What is the probability that the woman has cancer, given the positive result?

Solution Let the event $C$ denote cancer and event $P$ denote a positive result. We know: $P(C)=0.01$ and thus $P(C^\prime)=1-P(C)=1-0.01=0.99$. Also, $P(P|C)=0.8$ and $P(P^\prime|C^\prime)$, which implies $P(P|C^\prime)=0.1$. Putting this all together, we get” \[P(C|P)=\frac{P(P|C)P(C)}{P(P|C)P(C)+P(P|C^\prime)P(C^\prime)}=\frac{0.8(0.01)}{0.8(0.01)+0.1(0.99)}=0.075\]

Example 2 When you put your money into the coffee machine at the Student Center, a paper cup comes down and some coffee is put into it. You are supposed to get 8 oz of coffee. However, the actual amount of coffee is dispensed is a normal random variable with mean equal to the machine setting and variance of 0.0625 oz.

What should the machine setting be so that, in the long run, only at 2% of the drinks will contain less than 8 oz?

\[\begin{align*} & P(X\le 8)=P\left(Z\le \frac{8-\mu}{\sqrt{0.0625}}\right)=0.02, \qquad \text{table value -2.0537}\\ & -2.0537= \frac{8-\mu}{\sqrt{0.0625}}\qquad \Rightarrow \mu=8.513425 \end{align*}\]

Example 3 Let the random variable $X$ have the following pdf \[\begin{align*} f(w)=\begin{cases} cx^2, & -1\le x<0\\ cx, & 0\le x\le 2\\ 0, & \text{otherwise} \end{cases} \end{align*}\] where $c$ is a constant.

Find the value $c$ such that the above is a valid probability density function.
Find the mean of the random variable.
Find the cdf of $X$.
Find the median of the random variable.

$\begin{align*}& \frac{1}{c}=\int_{-1}^0x^2\;dx+\int_0^2x\; dx=\frac{1}{3}x^3\mid_{-1}^0+\frac{1}{2}x^2\mid_0^2=\frac{1}{3}+2=\frac{7}{3}, \qquad c=\frac{3}{7} \end{align*}$
$\begin{align*} E(X)=\int_{-1}^0\frac{3}{7}x^3\; dx+\int_0^2\frac{3}{7}x^2\; dx=\frac{3}{28}x^4\mid_{-1}^0+\frac{1}{7}x^3\mid_0^4=-\frac{3}{28}+\frac{8}{7}=\frac{29}{28} \end{align*}$
$\begin{align*} F(x)=\begin{cases} \int_{-1}^x\frac{3}{7}x^2dx=\frac{1}{7}x^3\mid_{-1}^x=\frac{1}{7}(x^3+1), & -1\le x<0\\ \frac{1}{7}+\int_0^x \frac{3}{7}xdx=\frac{1}{7}+\frac{3}{14}x^2, & 0\le x<2\\ 1, & x\ge 2 \end{cases} \end{align*}$
$\begin{align*} 0.5=\frac{1}{7}+\frac{3}{14}x^2, \qquad \Rightarrow \frac{5}{14}=\frac{3}{14}x^2, \qquad \Rightarrow x=\sqrt{\frac{5}{3}}\approx 1.29 \end{align*}$

Example 4 The lifetime (in months) of two components of a machine have joint probability density function \[\begin{align*} f(x,y)=c(50-x-y), \qquad 0<x<50-y<50 \end{align*}\] and zero otherwise.

Find the constant, $c$, such that $f(x,y)$ is valid joint pdf. You may leave the constant as $c$.
Find the marginal distributions.
Find the conditional distribution of $X$ given $Y=y$. If convenient, you may leave the constant as $c$.

$\begin{align*}\ & \frac{1}{c}=\int_0^{50}\int_0^{50-y} 50-x-y\;dx\;dy=\int_0^{50} \left[50x-\frac{1}{2}x^2-yx\mid_0^{50-y}\right] dy\\ & =\int_0^{50} \frac{1}{2}y^2-50y+1250dy=\frac{1}{6}y^3-25y^2+1250y\mid_0^{50}=\frac{125000}{6}\\ & \Rightarrow c=\frac{6}{125000} \end{align*}$

2. $\begin{align*}\ & f(x)=c\int_0^{50-x} 50-x-y\;dy=\frac{c}{2}(x-50)^2, \qquad 0\le x\le 50\\ & f(y)=c\int_0^{50-y} 50-x-y\;dx=\frac{c}{2}(y-50)^2, \qquad 0\le y\le 50 \end{align*}$

3. $\begin{align*}\ f(x|y)=\frac{f(x,y)}{f(y)}=\frac{c(50-x-y)}{\frac{c}{2}(y-50)^2}=\frac{2(50-x-y)}{(y-50)^2}, \qquad 0<x<50-y \end{align*}$

Example 5 Suppose the cdf for the discrete random variable $X$ is \[\begin{aligned} F(x)=\frac{x(x+1)}{20}, \qquad x=1,2,3,4. \end{aligned}\]

Find the pdf of $X$. You may put in table form.
Derive the $30^{th}$ percentile of $X$.
Derive the mean of $X$.
Derive the standard deviation of $X$.

Recall that the cdf is defined as $F(x)=P(X\le x)$. Start with $x=1$. We know $F(1)=P(X\le 1)$. Since there are no values less than 1, it follows that \[\begin{aligned} & F(1)=P(X\le 1)=P(X=1)=\frac{1(2)}{20}=\frac{2}{20}\\ & f(1)=P(X=1)=\frac{1}{10} \end{aligned}\] Next, consider $x=2$. \[\begin{aligned} & F(2)=P(X\le 2)=\frac{2(3)}{20}=\frac{3}{10}=0.3 \end{aligned}\] We also know that \[\begin{aligned} & F(2)=P(X\le 2)=P(X=1)+P(X=2) \end{aligned}\]
```
        Therefore,
```
\[\begin{aligned} & 0.3=P(X=1)+P(X=2)\\ & \Rightarrow P(X=2)=0.3-P(X=1)=0.3-0.1=0.2 \end{aligned}\]
Thus, $f(2)=P(X=2)=0.2$. We can continue on in the same manner to find the pmf for the other values of $x$. The tabular form of the pmf of $X$ is:

$x$ 1 2 3 4

$f(x)$ 0.1 0.2 0.3 0.4
Derive the $30^{th}$ percentile of $X$. \[\begin{aligned} & 0.3=\frac{x(x+1)}{20}\Rightarrow 6=x^2+x\Rightarrow x^2+x-6=0\\ & =(x+3)(x-2)=0\Rightarrow x=2 \end{aligned}\]
Derive the mean of $X$. \[\begin{aligned} E(X)=0\left(0\right)+1\left(\frac{2}{20}\right)+2\left(\frac{4}{20}\right)+3\left(\frac{6}{20}\right)+4\left(\frac{8}{20}\right)=\frac{60}{20}=3 \end{aligned}\]
Derive the standard deviation of $X$. \[\begin{aligned} & E(X^2)=0\left(0\right)+1\left(\frac{2}{20}\right)+4\left(\frac{4}{20}\right)+9\left(\frac{6}{20}\right)+16\left(\frac{8}{20}\right)=\frac{200}{20}=10\\ & \text{Var}(X)=E(X^2)-E(X)^2=10-3^2=10-9=1\\ & \text{SD}(X)=\sqrt{1}=1 \end{aligned}\]

\(x\)	1	2	3	4
\(f(x)\)	0.1	0.2	0.3	0.4

0.6: Outline of STAT 415

Suppose that we obtain data $x$ from a statistical model with an unknown parameter $\theta$. %Imagine that $\theta$ indexes a family of possible distributions for $x$, and that the true value of $\theta$ determines the true data-generating process. We will then consider questions such as:

Point Estimation: What is a good estimator for $\theta$? This depends on the definition of `good’, and we will introduce several criteria by which estimators can be judged. Just providing a point estimator $\hat{\theta}$, without any sense of its uncertainty, is usually unsatisfying. Thus, statistical inference emphasizes accompanying point estimators with information about their uncertainties, e.g., we may be able to describe the distribution of $\hat{\theta}$ or at least say what its standard deviation is. The standard deviation of an estimator is called its standard error.

Interval Estimation: Intuitively, much more informative than just saying something like ``I estimate $\theta$ as 2.5’’ is to provide an interval, such as saying

`I am 95% confident that $\theta$ is inclusively between 2.3 and 2.8.’

But what does `confident’ mean? In this course we will define precisely what it means to give an interval estimate, and study ways of constructing such estimates.

If $\theta$ is a constant, then it either is or isn’t in the interval $[2.3, 2.8]$, so what does the 95% mean?

Hypothesis Testing (model evaluation): In many applications in the physical, biological, and social sciences, a researcher is interested in testing a hypothesis which can be expressed in the form $\theta=\theta_0$ or, more generally, as $\theta\in H_0$ for some set $H_0$. Hypothesis testing is closely related to interval estimation (and arguably the latter is more useful), and can be approached via various perspectives.

--- categories: [Distributions, Bayes', Probability, Central Limit Theorem] file-modified: image: /assets/415lesson0thumb.png --- # 0: Review Concepts from STAT 414 {.unnumbered} ```{r echo=FALSE} #| label: setup options(repos = c(CRAN = "https://cran.rstudio.com/")) #this stopped the issue with mirror on sever packages ``` Many of the concepts from STAT 414: Introduction to Probability Theory will carry over in this course. In this unit, some of the essential formulas and theorems are presented here for review. ::: objectiveblock <i class="bi bi-check2-circle"></i>[Objectives]{.callout-header} Upon completion of this lesson, you should be able to recall and use your prior knowledge to: 1. Define key terms and theorems related to probability. 2. Summarize the properties and applications of discrete random variables. 3. Distinguish between continuous random variables and discrete random variables, providing examples and explaining their differences. 4. Explain the formulas and definitions of jointly distributed random variables and apply them to solve problems involving joint probability distributions. 5. Describe the concept of statistical inference and its role in making conclusions about populations based on sample data. ::: ## 0.1: Probability In this section, we review some of the important definitions and theorems for probability. It would be helpful to refresh your memory on these concepts and do some practice. ::: {#def-event .ms-3} ### Event Let the outcome space (or sample space) be denoted as $\mathbf{S}$. An **event** is subset of the sample space, Events are often denotes with capital letters, $A,B, C, \ldots$. Therefore, if $A$ is an event in $\mathbf{S}$, then $A\subset \mathbf{S}$. ::: ::: {#def-probabilty .ms-3} ### Probability Probability is a logical framework for quantifying uncertainty or randomness in a princled way. It is primarily use to explain the uncertainty or randomness in obtaining the current data. ::: ::: {#def-mututallyexclusive .ms-3} ### Mutually Exclusive Events Events $A$ and $B$ are called **mutually exclusive** (or **disjoint**) events if $A\cap B=\emptyset$. ::: ::: {#def-conditionprob .ms-3} ### Conditional Probability The conditional probability of an event $A$ given that an event $B$ has occurred is: $$\begin{aligned} P(A|B)=\frac{P(A\cap B)}{P(B)} \end{aligned}$$ as long as $P(B)>0$. ::: ::: {#def-independent .ms-3} ### Independent Events Events $A$ and $B$ are **independent events** if the occurrence of one of them does not affect the probability of the occurrence of the other. That is, two events are independent if either $$\begin{aligned} P(B|A)=P(B) \end{aligned}$$ provided that $P(A)>0$. Events $A$ and $B$ are independent events if and only if $$\begin{aligned} P(A\cap B)=P(A)P(B) \end{aligned}$$ Otherwise, $A$ and $B$ are called **dependent events**. ::: ::: {.callout-note appearance="minimal"} ::: {#thm-Bayes} ### Bayes' Theorem Let the $m$ events $B_1, B_2, \ldots, B_m$ constitute a partition of the sample space $S$. That is, the $B_i$ are mutually exclusive: $$\begin{aligned} B_i\cap B_j=\emptyset \text{ for }i\ne j \end{aligned}$$ and exhaustive: $$\begin{aligned} S=B_1\cup B_2\cup B_2\cup \ldots \cup B_m \end{aligned}$$ Also, suppose the prior probability of the vent $B_i$ is positive, that is $P(B_i)>0$ for $i=1, \ldots,m$. Now if $A$ is an event, then $A$ can be written as the union of $m$ mutually exclusive events, namely: $$\begin{aligned} A=(A\cap B_1)\cup (A\cap B_2)\cup \ldots \cup (A\cap B_m) \end{aligned}$$ Therefore, $$\begin{aligned} P(A) & = P(A\cap B_1)+P(A\cap B_2)+\ldots+P(A\cap B_m)\\ & = \sum_{i=1}^m P(A\cap B_i)=\sum_{i=1}^m P(B_i)P(A|B_i) \end{aligned}$$ And so, as long as $P(A)>0$, the posterior probability of event $B_k$ given event $A$ has occurred is: $$\begin{aligned} P(B_k|A)=\frac{P(B_k)P(A|B_k)}{\sum_{i=1}^m P(B_i)P(A|B_i)} \end{aligned}$$ ::: ::: ## 0.2: Distributions We will use distributions heavily in this course. It is important to review the definitions and expectations of random variables. We start with a brief review of discrete distributions and then onto continuous distributions. ### Discrete Distributions ::: {#def-pmf .ms-3} #### Probability Mass Function For a discrete random variable, $X$, the **probability mass function** (pmf or PMF) is a function that satisfies the following properties: - $f(x)=P(X=x)>0$, for all $x\in$ the support $S$ - $\sum_{x\in S}f(x)=1$ - $P(X\in A)=\sum_{x\in A}f(x)$, given event $A$ It is important to note that the pmf is the probability that $X$ takes on a particular value, $x$. ::: ::: {#def-cdf .ms-3} #### Cumulative Distribution Function For a discrete random variable, $X$, the **cumulative distribution function** (cdf or CDF), denoted $F(x)$ is defined as: $$\begin{aligned} F(x)=P(X\le x) \end{aligned}$$ The cdf of a random variable $X$ has the following properties: - $F_X(t)$ is a non-decreasing function of $t$, for $-\infty<t<\infty$. - $F_X(t)$ ranges from 0 to 1. - If $X$ is a discrete random variable whose minimum is $a$, then $$\begin{aligned} F_X(a)=P(X\le a)=P(X=a)=f_X(a) \end{aligned}$$ If $c$ is less than $a$, then $F_X(c)=0$. - If the maximum value of $X$ is $b$, then $F_X(b)=1$. - Also called the *distribution function*. - All probabilities concerning $X$ can be stated in terms of $F$. ::: Below is a list some of the named discrete distributions from the previous course. It is important to be able to recall the probability mass functions, expected values, and variances for these discrete distributions. :::{.border .rounded .border-lionshrine .ms-4 .p-2 .mb-1} [Bernoulli]{.fs-4}: (p) with $0<p<1$ $$\begin{aligned} & f(x)=p^x(1-p)^{1-x}, \qquad x=0, 1\\ & E(X)=p, \qquad \text{Var}(X)=p(1-p) \end{aligned}$$ ::: {.callout-caution appearance="minimal"} **Note!**\ The Bernoullu distribution is a special case of the Binomial distribution with $n=1$. ::: ::: :::{.border .rounded .border-lionshrine .ms-4 .p-2 .mb-1} [Binomial]{.fs-4}: (n, p) with $0<p<1$ $$\begin{aligned} & f(x)={n\choose x}p^x(1-p)^{n-x}, \qquad x=0, 1, \ldots, n\\ & E(X)=np, \qquad \text{Var}(X)=np(1-p) \end{aligned}$$ ::: :::{.border .rounded .border-lionshrine .ms-4 .p-2 .mb-1} [Geometric]{.fs-4}: (p) with $0<p<1$ $$\begin{aligned} & f(x)=p(1-p)^x, \qquad x=0, 1, 2, 3, \ldots\\ & E(X)=\frac{1}{p}, \qquad \text{Var}(X)=\frac{1-p}{p^2} \end{aligned}$$ ::: :::{.border .rounded .border-lionshrine .ms-4 .p-2 .mb-1} [Negative Binomial]{.fs-4}: (r, p) with $0<p<1$ and $r=1, 2, \ldots$ $$\begin{aligned} & f(x)={{x-1}\choose {r-1}}p^r(1-p)^{x-r}, \qquad x=r, r+1. r+2, \ldots\\ & E(X)=r\left(\frac{1}{p}\right), \qquad \text{Var}(X)=\frac{r(1-p)}{p^2} \end{aligned}$$ ::: {.callout-caution appearance="minimal"} **Note!**\ The Geometric distribution is a special case of the Negative Binomial distribution with $r=1$. ::: ::: :::{.border .rounded .border-lionshrine .ms-4 .p-2 .mb-1} [Hypergeometric]{.fs-4}: (n, $N_1$, $N_2$) with $N_1>0$, $N_2>0$, $N=N_1+N_2$, $1\le n\le N_1+N_2$ $$\begin{aligned} & f(x)=\frac{{N_1\choose x} {N_2\choose{n-x}}}{{N\choose n}}, \qquad x\le n, \;\; x\le N_1, \;\; n-x\le N-2\\ & E(X)=n\left(\frac{N_1}{N}\right), \qquad \text{Var}(X)=n\left(\frac{N_1}{N}\right)\left(\frac{N_2}{N}\right)\left(\frac{N-n}{N-1}\right) \end{aligned}$$ ::: :::{.border .rounded .border-lionshrine .ms-4 .p-2 .mb-1} [Poisson]{.fs-4}: ($\lambda$) with $\lambda>0$ $$\begin{aligned} & f(x)=\frac{e^{-\lambda}\lambda^x}{x!}, \qquad x=0, 1, 2, 3, \ldots\\ & E(X)=\lambda, \qquad \text{Var}(X)=\lambda \end{aligned}$$ ::: ### Continuous Distributions In this section, we present some of the named distributions for continuous random variables. ::: {#def-pdf .ms-3} #### Probability Density Function For a continuous random variable, $X$, the **probability density function** (pdf or PDF) is a function that satisfies the following properties: - $f(x)>0$, for all $x\in$ the support $S$ - The area under the curve $f(x)$ in the support is equal to 1. That is: $$\begin{aligned} \int_{x\in S}f(x)=1 \end{aligned}$$ - If $A$ is an interval, then the probability that $x$ belongs to $A$ is: $$\begin{aligned} P(X\in A)=\int_{A}f(x)dx \end{aligned}$$ ::: It is important to note that unlike the probability mass function (pmf) of a discrete random variable, $f(x)\ne P(X=x)$ for a continuous random variable, $X$, In fact, $P(X=x)=0$ if $X$ is a continuous random variable. Next, we will review the named continuous distributions. Again, it is import to recognize the probability density functions, expected values, and variances the these continuous distributions. :::{.border .rounded .border-lionshrine .ms-4 .p-2 .mb-1} [Beta]{.fs-4}: ($\alpha, \beta$), with $\alpha>0$ and $\beta>0$ $$\begin{aligned} &f(x)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1}, \qquad 0<x<1\\ & E(X)=\frac{\alpha}{\alpha+\beta}, \qquad \text{Var}(X)=\frac{\alpha\beta}{(\alpha+\beta+1)(\alpha+\beta)^2} \end{aligned}$$ ::: :::{.border .rounded .border-lionshrine .ms-4 .p-2 .mb-1} [Exponential]{.fs-4}: ($\theta$), with $\theta>0$ $$\begin{aligned} &f(x)=\frac{1}{\theta}e^{-x/\theta}, \qquad 0<x\\ & E(X)=\theta, \qquad \text{Var}(X)=\theta^2 \end{aligned}$$ ::: {.callout-caution appearance="minimal"} **Note!**\ The exponential distribution is a special case of the Gamma distribution with $\alpha=1$. ::: ::: :::{.border .rounded .border-lionshrine .ms-4 .p-2 .mb-1} [Chi-square]{.fs-4}: $\chi^2$(r), with $r>0$ $$\begin{aligned} &f(x)=\frac{1}{\Gamma(r/2)2^{r/2}}x^{r/2-1}e^{-x/2}, \qquad 0<x\\ & E(X)=r, \qquad \text{Var}(X)=2r \end{aligned}$$ ::: {.callout-caution appearance="minimal"} **Note!** The chi-square distribution is a special case of the Gamma distribution with $\alpha=r/2$ and $\theta=2$. ::: ::: :::{.border .rounded .border-lionshrine .ms-4 .p-2 .mb-1} [Gamma]{.fs-4}: ($\alpha$, $\theta$) with $\alpha>0$ and $\theta>0$ $$\begin{aligned} & f(x)=\frac{1}{\Gamma(\alpha)\theta^\alpha} x^{\alpha-1}e^{-x/\theta}, \qquad x>0\\ & E(X)=\alpha \theta, \qquad \text{Var}(X)=\alpha\theta^2 \end{aligned}$$ ::: :::{.border .rounded .border-lionshrine .ms-4 .p-2 .mb-1} [Normal]{.fs-4}: ($\mu$, $\sigma^2$) with $\sigma>0$ $$\begin{aligned} & f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2\sigma^2}(x-\mu)^2}, \qquad -\infty<x<\infty\\ & E(X)=\mu, \qquad \text{Var}(X)=\sigma^2 \end{aligned}$$ ::: :::{.border .rounded .border-lionshrine .ms-4 .p-2 .mb-1} [Uniform]{.fs-4}: ($a$, $b$) $$\begin{aligned} & f(x)=\frac{1}{b-a}, \qquad a<x<b\\ & E(X)=\frac{a+b}{2}, \qquad \text{Var}(X)=\frac{(b-a)^2}{12} \end{aligned}$$ ::: ## 0.3: Mathematical Expectations You should be able to find the expected value and variance of a given random variable, as well as expected values of functions of random variables. Expectations will become particularly important when we learn about properties of estimators in the next few lessons. ::: {#def-expectdiscreterv .ms-3} #### Mathematical Expectation for discrete random variable If $f(x)$ is the probability mass function (pmf) of the discrete random variable $X$ with support $S$, and if the summation $$\begin{aligned} \sum_{x\in S} u(x)f(x) \end{aligned}$$ exists (that is, it is less than $\infty$), then the resulting sum is called the mathematical expectation or the expected value of the function $u(X)$. The expectation is denoted $$\begin{aligned} E[u(X)]=\sum_{x\in S}u(x)f(x) \end{aligned}$$ ::: ::: {#def-expectcontinuousrv .ms-3} #### Mathematical Expectation for a continuous random variable For a continuous random variable, $X$, with probability distribution function (pdf), $f(x)$, the expected value of a function of $X$, $u(X)$ is $$\begin{aligned} E(u(X))=\int_{-\infty}^\infty u(x)f(x)\;dx \end{aligned}$$ ::: Special mathematical expectations: - If $u(x)=x$ then $E(u(X))=\mu$ - If $u(x)=(x-\mu)^2$, then $E(u(X))=\sigma^2=\text{Var}(X)$. ::: {.callout-caution appearance="minimal"} **Note:**\ Recall: $\text{Var}(X)=E(X^2)-E(X)^2$ and therefore $E(X^2)=\sigma^2+\mu^2$. ::: ## 0.4: Bivariate Distributions In this class, it will important to recall bivariate distributions, and be able to make the extension to multivariate distributions. A **joint (bivariate) probability distribution** describes the probability that a randomly selected object from the population has two characteristics of interest. Recall the definition of a joint (bivariate) probability mass function. ::: {#def-jointpmf .ms-3} ### Joint Probability Mass Function Let $X$ and $Y$ be two discrete random variabled, and let $S$ denote the two-dimensional support of $X$ and $Y$. The function $f(x,y)=P(X=x, Y=y)$ is a **joint probability mass function** (pmf) if it satisfies the following three conditions: 1. $0\le f(x,y)\le 1$ 2. $\sum_{(x,y)\in S}\sum f(x,y)=1$ 3. $P[(X,Y)\in A]=\sum_{(x,y)\in A}f(x,y)$, where $A$ is a subset of the support $S$. ::: ::: {#def-name .ms-3} ### Joint Probability Density Function Let $X$ and $Y$ be two continuous random variables, and let $S$ denote the two-dimensional support of $X$ and $Y$. Then, the function $f(x,y)$ is a **joint probability density function** (pdf) if it satisfies the following three conditions: 1. $f(x,y)>0$ 2. $\int_{-\infty}^\infty \int_{-\infty}^\infty f(x,y)dxdy=1$ 3. $P[(X,Y)\in A]=\int \int_A f(x,y)dxdy$ where $\{(X,Y)\in A\}$ is an event in the $xy$-plane. ::: ::: {#def-margpmf .ms-3} ### Marginal PMFs {.unnumbered} Let $X$ be a discrete random variable with support $S_1$, and let $Y$ be a discrete random variable with support $S_2$. Let $X$ and $Y$ have a joint probability mass function $f(x,y)$ with support $S$. Then, the probability mass function of $X$ alone, which is called the **marginal probability mass function of** $X$ is defined by: $$\begin{aligned} f_X(x)=\sum_{y}f(x,y)=P(X=x), \qquad x\in S_1 \end{aligned}$$ where, for each $x$ in the support $S_1$, the summation is taken over all possible values of $y$. Similarly, the probability mass function of $Y$ alone, which is called the **marginal probability mass function of** $Y$ is defined by: $$\begin{aligned} f_Y(y)=\sum_{x}f(x,y)=P(Y=y), \qquad y\in S_2 \end{aligned}$$ where, for each $y$ in the support $S_2$, the summation is taken over all possible values of $x$. ::: ::: {#def-margpmf .ms-3} ### Marginal PDFs {.unnumbered} Let $X$ be a continuous random variable with support $S_1$, and let $Y$ be a continuous random variable with support $S_2$. Let $X$ and $Y$ have a joint probability denstiy function $f(x,y)$ with support $S$. Then, the probability density function of $X$ alone, which is called the **marginal probability density function of** $X$ is defined by: $$\begin{aligned} f_X(x)=\int_{y}f(x,y)dy, \qquad x\in S_1 \end{aligned}$$ Similarly, the probability density function of $Y$ alone, which is called the **marginal probability density function of** $Y$ is defined by: $$\begin{aligned} f_Y(y)=\int_{x}f(x,y)dx, \qquad y\in S_2 \end{aligned}$$ ::: Suppose we wish to determine if two random variables are independent, given the joint probability distribution and the marginal distributions. The random variables, $X$ and $Y$ are **independent** if and only if $$\begin{aligned} f(x,y)=f_X(x)f_Y(y) \end{aligned}$$ for all $x \in S_1, \; y\in S_2$. Otherwise, $X$ and $Y$ are said to be **dependent**. ### Conditional Distributions {.unnumbered} It is helpful to review conditional distributions for this course. They will be particularly useful when we cover Bayesian statistics. A **conditional probability distribution** is a probability distribution for a sub-population. That is, a conditional distribution describes the probability that a randomly selected object from a sub-population has the *one characteristic of interest*. ::: {#def-conditionpmf .ms-3} ### Conditional probability mass function The **conditional probability mass function of** $X$, given $Y=y$ is defined by: $$\begin{aligned} g(x|y)=\frac{f(x,y)}{f_Y(y)}, \qquad \text{provided }f_Y(y)>0 \end{aligned}$$ ::: ### Distributions of Sums of Several Random Variables {.unnumbered} In this class, we will need to find the distributions of functions of random variables and also of functions of several random variables. Being able to recall some of the helpful There are some helpful theorems for the sum of several random variables. ::: {.callout-note appearance="minimal"} ::: {#thm-l11} Let $X_1, X_2, \ldots, X_n$ be a random sample from a Bernoulli distribution with parameter p.\ Then $Y=\sum_{i=1}^n X_i\sim\text{Bin}(n, p)$. ::: ::: ::: {.callout-note appearance="minimal"} ::: {#thm-l12} Let $X_1, X_2, \ldots, X_n$ be a random sample from a Binomial distribution with parameters $n_i$ and p, for $i=1, 2, \ldots, n$.\ Then $Y=\sum_{i=1}^n X_i\sim \text{Bin}(\sum n_i, p)$. ::: ::: ::: {.callout-note appearance="minimal"} ::: {#thm-l13} Let $X_1,X_2,\ldots,X_n$ be a random sample from a Poisson distribution with parameter $\lambda_i$, for $i=1,2,\ldots,n$.\ Then $Y=\sum_{i=1}^n X_i\sim \text{Pois}(\sum \lambda_i)$. ::: ::: ::: {.callout-note appearance="minimal"} ::: {#thm-l14} Let $X_1,X_2,\ldots,X_n$ be a random sample from an Exponential distribution with parameter $\theta$, for $i=1,2,\ldots,n$.\ Then $Y=\sum_{i=1}^n X_i\sim \Gamma(\alpha=n, \theta)$. ::: ::: ::: {.callout-note appearance="minimal"} ::: {#thm-l15} Let $X_1, X_2, \ldots, X_n$ be a random sample from a Gamma distribution with parameters $\alpha_i$ and $\theta$, for $i=1,2,\ldots,n$.\ Then $Y=\sum_{i=1}^n X_i\sim \Gamma(\sum_{i=1}^n \alpha_i, \theta)$. ::: ::: #### Random Functions Associated with Normal Distributions {.unnumbered} There are many theorems for the previous statistics course that you may find useful to recall in this semester. ::: {.callout-note appearance="minimal"} ::: {#thm-l16} If $X_1, X_2, \ldots, X_n$ are mutually independent normal random variables with means $\mu_1, \mu_2, \ldots, \mu_n$ and variances $\sigma^2_1, \sigma^2_2, \ldots, \sigma^2_n$, respectively, then the linear combination: $$\begin{aligned} Y=\sum_{i=1}^n c_iX_i \end{aligned}$$ follows a Normal distribution: $$\begin{aligned} N\left(\sum_{i=1}^n c_i\mu, \sum_{i=1}^n c_i^2\sigma^2_i\right) \end{aligned}$$ ::: ::: ::: {.callout-note appearance="minimal"} ::: {#cor-l11} If $X_1, X_2, \ldots, X_n$ are observations of a random sample of size $n$ from a $N(\mu, \sigma^2)$ population, then the sample mean: $$\begin{aligned} \bar{X}=\frac{1}{n}\sum_{i=1}^n X_i \end{aligned}$$ is normally distributed with mean $\mu$ and variance $\frac{\sigma^2}{n}$. ::: ::: ::: {.callout-note appearance="minimal"} ::: {#thm-l17} Let $X_1, X_2, \ldots, X_n$ be observations of a random sample of size $n$ from the Normal distribution $N(\mu, \sigma^2)$. Let $\bar{X}=\frac{1}{n}\sum_{i=1}^n X_i$ and $S^2=\frac{1}{n_1}\sum_{i=1}^n (X_i-\bar{X})^2$ be the sample mean and sample variance, respectively. Then, $\bar{X}$ and $S^2$ are independent. Also, $$\begin{aligned} \frac{(n-1)S^2}{\sigma^2} \end{aligned}$$ follows a Chi-square distribution with $n-1$ degrees of freedom. ::: ::: ::: {#def-noname .ms-3} #### t-distribution If $Z\sim N(0,1)$ and $U\sim \chi^2(r)$ are independent, then the random variable $$\begin{aligned} T=\frac{Z}{\sqrt{\frac{U}{r}}} \end{aligned}$$ follows at $t$-distribution with $r$ degrees of freedom.\ ::: ::: {.callout-note appearance="minimal"} ::: {#thm-clt} #### Central Limit Theorem Let $X_1, X_2, \ldots, X_n$ be a random sample from a distribution with finite mean $\mu$ and finite variance $\sigma^2$. If the sample size, $n$, is *sufficiently large*, then: - the sample mean $\bar{X}$ follows an approximate normal distribution - with mean $\mu_{\bar{X}}=\mu$ - and variance $\sigma^2_{\bar{X}}=\frac{\sigma^2}{n}$. ::: ::: ## 0.5: Review Examples In this section, we present some examples to help you recall some of the information from the Introduction to Probability course. ::: {#exm-probility1} A woman who was initially thougth to have a 1% risk of cancer ended up with a positive mammogram result. A mammogram accurately classifies about 80% of cancerous tumors and 90% of benign tumors. What is the probability that the woman has cancer, given the positive result? **Solution** Let the event $C$ denote cancer and event $P$ denote a positive result. We know: $P(C)=0.01$ and thus $P(C^\prime)=1-P(C)=1-0.01=0.99$. Also, $P(P|C)=0.8$ and $P(P^\prime|C^\prime)$, which implies $P(P|C^\prime)=0.1$. Putting this all together, we get" $$P(C|P)=\frac{P(P|C)P(C)}{P(P|C)P(C)+P(P|C^\prime)P(C^\prime)}=\frac{0.8(0.01)}{0.8(0.01)+0.1(0.99)}=0.075$$ ::: ::: {#exm-normal1} When you put your money into the coffee machine at the Student Center, a paper cup comes down and some coffee is put into it. You are supposed to get 8 oz of coffee. However, the actual amount of coffee is dispensed is a normal random variable with mean equal to the machine setting and variance of 0.0625 oz. What should the machine setting be so that, in the long run, only at 2% of the drinks will contain less than 8 oz? ```{=html} <button class="btn btn-outline-success collapsed" type="submit" data-bs-toggle="collapse" data-bs-target="#collapseExample1" aria-expanded="false" aria-controls="collapseExample1">Solution </button> ``` :::: {#collapseExample1 .collapse} ::: {.card .card-body .bg-light .ms-3 .mb-3 .pt-3} $$\begin{align*} & P(X\le 8)=P\left(Z\le \frac{8-\mu}{\sqrt{0.0625}}\right)=0.02, \qquad \text{table value -2.0537}\\ & -2.0537= \frac{8-\mu}{\sqrt{0.0625}}\qquad \Rightarrow \mu=8.513425 \end{align*}$$ ::: :::: ::: ::: {#exm-continuous} Let the random variable $X$ have the following pdf $$\begin{align*} f(w)=\begin{cases} cx^2, & -1\le x<0\\ cx, & 0\le x\le 2\\ 0, & \text{otherwise} \end{cases} \end{align*}$$ where $c$ is a constant. 1. Find the value $c$ such that the above is a valid probability density function. 2. Find the mean of the random variable. 3. Find the cdf of $X$. 4. Find the median of the random variable. ```{=html} <button class="btn btn-outline-success collapsed" type="submit" data-bs-toggle="collapse" data-bs-target="#collapseExample2" aria-expanded="false" aria-controls="collapseExample2">Solution </button> ``` :::: {#collapseExample2 .collapse} ::: {.card .card-body .bg-light .ms-3 .mb-3 .pt-3} 1. $\begin{align*}& \frac{1}{c}=\int_{-1}^0x^2\;dx+\int_0^2x\; dx=\frac{1}{3}x^3\mid_{-1}^0+\frac{1}{2}x^2\mid_0^2=\frac{1}{3}+2=\frac{7}{3}, \qquad c=\frac{3}{7} \end{align*}$ \ <br> 2. $\begin{align*} E(X)=\int_{-1}^0\frac{3}{7}x^3\; dx+\int_0^2\frac{3}{7}x^2\; dx=\frac{3}{28}x^4\mid_{-1}^0+\frac{1}{7}x^3\mid_0^4=-\frac{3}{28}+\frac{8}{7}=\frac{29}{28} \end{align*}$ \ <br> 3. $\begin{align*} F(x)=\begin{cases} \int_{-1}^x\frac{3}{7}x^2dx=\frac{1}{7}x^3\mid_{-1}^x=\frac{1}{7}(x^3+1), & -1\le x<0\\ \frac{1}{7}+\int_0^x \frac{3}{7}xdx=\frac{1}{7}+\frac{3}{14}x^2, & 0\le x<2\\ 1, & x\ge 2 \end{cases} \end{align*}$ <br> 4. $\begin{align*} 0.5=\frac{1}{7}+\frac{3}{14}x^2, \qquad \Rightarrow \frac{5}{14}=\frac{3}{14}x^2, \qquad \Rightarrow x=\sqrt{\frac{5}{3}}\approx 1.29 \end{align*}$ ::: :::: ::: ::: {#exm-bivariate_cont} The lifetime (in months) of two components of a machine have joint probability density function $$\begin{align*} f(x,y)=c(50-x-y), \qquad 0<x<50-y<50 \end{align*}$$ and zero otherwise. 1. Find the constant, $c$, such that $f(x,y)$ is valid joint pdf. You may leave the constant as $c$. 2. Find the marginal distributions. 3. Find the conditional distribution of $X$ given $Y=y$. If convenient, you may leave the constant as $c$. ```{=html} <button class="btn btn-outline-success collapsed" type="submit" data-bs-toggle="collapse" data-bs-target="#collapseExample3" aria-expanded="false" aria-controls="collapseExample3">Solution </button> ``` :::: {#collapseExample3 .collapse} ::: {.card .card-body .bg-light .ms-3 .mb-3 .pt-3} 1. $\begin{align*}\ & \frac{1}{c}=\int_0^{50}\int_0^{50-y} 50-x-y\;dx\;dy=\int_0^{50} \left[50x-\frac{1}{2}x^2-yx\mid_0^{50-y}\right] dy\\ & =\int_0^{50} \frac{1}{2}y^2-50y+1250dy=\frac{1}{6}y^3-25y^2+1250y\mid_0^{50}=\frac{125000}{6}\\ & \Rightarrow c=\frac{6}{125000} \end{align*}$ \ <br> 2. $\begin{align*}\ & f(x)=c\int_0^{50-x} 50-x-y\;dy=\frac{c}{2}(x-50)^2, \qquad 0\le x\le 50\\ & f(y)=c\int_0^{50-y} 50-x-y\;dx=\frac{c}{2}(y-50)^2, \qquad 0\le y\le 50 \end{align*}$ \ <br> 3. $\begin{align*}\ f(x|y)=\frac{f(x,y)}{f(y)}=\frac{c(50-x-y)}{\frac{c}{2}(y-50)^2}=\frac{2(50-x-y)}{(y-50)^2}, \qquad 0<x<50-y \end{align*}$ ::: :::: ::: ::: {#exm-cdf} Suppose the cdf for the discrete random variable $X$ is $$\begin{aligned} F(x)=\frac{x(x+1)}{20}, \qquad x=1,2,3,4. \end{aligned}$$ 1. Find the pdf of $X$. You may put in table form. 2. Derive the $30^{th}$ percentile of $X$. 3. Derive the mean of $X$. 4. Derive the standard deviation of $X$. ```{=html} <button class="btn btn-outline-success collapsed" type="submit" data-bs-toggle="collapse" data-bs-target="#collapseExample4" aria-expanded="false" aria-controls="collapseExample4">Solution </button> ``` :::: {#collapseExample4 .collapse} ::: {.card .card-body .bg-light .ms-3 .mb-3 .pt-3} 1. Recall that the cdf is defined as $F(x)=P(X\le x)$. Start with $x=1$. We know $F(1)=P(X\le 1)$. Since there are no values less than 1, it follows that $$\begin{aligned} & F(1)=P(X\le 1)=P(X=1)=\frac{1(2)}{20}=\frac{2}{20}\\ & f(1)=P(X=1)=\frac{1}{10} \end{aligned}$$ Next, consider $x=2$. $$\begin{aligned} & F(2)=P(X\le 2)=\frac{2(3)}{20}=\frac{3}{10}=0.3 \end{aligned}$$ We also know that $$\begin{aligned} & F(2)=P(X\le 2)=P(X=1)+P(X=2) \end{aligned}$$ ``` Therefore, ``` ```{=tex} \begin{aligned} & 0.3=P(X=1)+P(X=2)\\ & \Rightarrow P(X=2)=0.3-P(X=1)=0.3-0.1=0.2 \end{aligned} ``` Thus, $f(2)=P(X=2)=0.2$. We can continue on in the same manner to find the pmf for the other values of $x$. The tabular form of the pmf of $X$ is:\ | $x$ | 1 | 2 | 3 | 4 | | |:------:|:---:|:---:|:---:|:---:|-----| | $f(x)$ | 0.1 | 0.2 | 0.3 | 0.4 | | : {.w-auto .table-sm .mx-auto .row-header } 2. Derive the $30^{th}$ percentile of $X$. $$\begin{aligned} & 0.3=\frac{x(x+1)}{20}\Rightarrow 6=x^2+x\Rightarrow x^2+x-6=0\\ & =(x+3)(x-2)=0\Rightarrow x=2 \end{aligned}$$ 3. Derive the mean of $X$. $$\begin{aligned} E(X)=0\left(0\right)+1\left(\frac{2}{20}\right)+2\left(\frac{4}{20}\right)+3\left(\frac{6}{20}\right)+4\left(\frac{8}{20}\right)=\frac{60}{20}=3 \end{aligned}$$ 4. Derive the standard deviation of $X$. $$\begin{aligned} & E(X^2)=0\left(0\right)+1\left(\frac{2}{20}\right)+4\left(\frac{4}{20}\right)+9\left(\frac{6}{20}\right)+16\left(\frac{8}{20}\right)=\frac{200}{20}=10\\ & \text{Var}(X)=E(X^2)-E(X)^2=10-3^2=10-9=1\\ & \text{SD}(X)=\sqrt{1}=1 \end{aligned}$$ ::: :::: ::: ## 0.6: Outline of STAT 415 Suppose that we obtain data $x$ from a statistical model with an unknown parameter $\theta$. %Imagine that $\theta$ indexes a family of possible distributions for $x$, and that the true value of $\theta$ determines the true data-generating process. We will then consider questions such as: **Point Estimation**: What is a **good** estimator for $\theta$? This depends on the definition of `good', and we will introduce several criteria by which estimators can be judged. Just providing a point estimator $\hat{\theta}$, without any sense of its uncertainty, is usually unsatisfying. Thus, statistical inference emphasizes accompanying point estimators with information about their uncertainties, e.g., we may be able to describe the distribution of $\hat{\theta}$ or at least say what its standard deviation is. The standard deviation of an estimator is called its standard error. **Interval Estimation**: Intuitively, much more informative than just saying something like ``I estimate $\theta$ as 2.5'' is to provide an interval, such as saying `I am 95% confident that $\theta$ is inclusively between 2.3 and 2.8.' But what does `confident' mean? In this course we will define precisely what it means to give an interval estimate, and study ways of constructing such estimates. If $\theta$ is a constant, then it either is or isn't in the interval $[2.3, 2.8]$, so what does the 95% mean? **Hypothesis Testing (model evaluation)**: In many applications in the physical, biological, and social sciences, a researcher is interested in testing a hypothesis which can be expressed in the form $\theta=\theta_0$ or, more generally, as $\theta\in H_0$ for some set $H_0$. Hypothesis testing is closely related to interval estimation (and arguably the latter is more useful), and can be approached via various perspectives. ```{r} #| label: load-packages #| include: false library(tidyverse) library(rmarkdown) ```