EGN3443 Probability and Statistics for Engineers

EGN 3443 Probability and Statistics for Engineers

Hui Yang

This course presents the theory and methods of probability and statistics models needed to support engineering decision making. The course objectives include:

To understand the basic concepts of probability and statistics.

To understand the data representation techniques.

To learn discrete and continuous random variables, probability distributions, measure of central tendency, and measure of dispersion.

To learn the statistical inference and hypothesis testing.

To understand the regression analysis using least square parameter estimation.

To develop the statistical way of thinking.

Bayes Theorem

1. Suppose there is a rare cancer disease, the probability of a randomly selected person people to get this cancer is 1/10000. And suppose there's a lab blood test for this cancer that is 98% accurate in the sense that 98% of those who have the cancer will test positive, but the lab blood test also show positive for 2% of those who do not have such a cancer disease.

If a randomly selected person takes the lab blood test and the result is positive, what is the probability that he really has this cancer?

Let's see the following confusion matrix (What a good name? :)

True positive: Diseased people correctly tested positive

False positive: Healthy people wrongly tested as positive

True negative: Healthy people correctly tested as negative

False negative: Diseased people wrongly tested as negative

		Actual condition
		disease	No disease
Test Results	Positive	True Positive (i.e. disease reported and present) P (positive \| disease)	*False Positive* *(Type I error)* (i.e. disease reported but not present) P (positive \| no disease)
Test Results	Negative	*False Negative (Type II error)*** (i.e. disease not detected) P (negative \| disease)	True Negative (i.e. disease not reported and not present) P (negative \| no disease)

Note: false positive rate and false negative rate are not necessarily the same. They are both 0.02 in this example for simplicity.

P(positive |disease) = 0.98

But we want to know, P(disease | positive)?

P(positive|disease)* P(disease)

P(disease|positive)= -------------------------------------------------------------

                                                          P(positive)

                                           P(positive|disease)* P(disease)

   =    -----------------------------------------------------------------------------------------------------------------------

P(positive|disease)* P(disease)+ P(positive|no disease)* P(no disease)

                    0.98*0.0001

   =    -------------------------------------------- =     0.004877

0.98*0.0001+0.02*0.9999

Knowing that you tested positive increased your probability of having the disease from 0.0001 to 0.004877, but not all the way to 0.98.

2. Let's go back and look at what would happen if 60% of the original population had the disease (not 1/10000 any more).

When the prevalence of the disease is 0.60, the probability of having the disease given a positive test result is

P(positive|disease)* P(disease)

P(disease|positive)= ------------------------------------------------------------------

                                                          P(positive)

                                           P(positive|disease)* P(disease)

   =    -------------------------------------------------------------------------------------------------------------------

P(positive|disease)* P(disease)+ P(positive|no disease)* P(no disease)

                    0.98*0.6

   =    ---------------------------------- =  0.986577

0.98*0.6+0.02*0.4

If the prevalence of the disease is 60%, then knowing that you tested positive increased your probability of having the disease from 60% to 98.6577.

3. What kind of prevalence of the disease will make P(disease | positive) = P(positive |disease) if the lab test results remain the same?

0.98*x

0.98 = ---------------------------------- ( solve this equation for x - prevalence of the disease)

0.98* x +0.02*(1- x)

4. Bulls mind-expanding: If a randomly selected person tests negative, what is the probability that he does not have the disease?

Discrete Random Variables and Probability Distribution

Rolling Two Fair Dice

1. List all possible outcomes (a,b) of rolling the two dice. Let a denote the number on the top of the first die and b the number on the top of the second die. Note that each of a and b can be any of the integers from 1 through 6.

(1,1)	(1,2)	(1,3)	(1,4)	(1,5)	(1,6)
(2,1)	(2,2)	(2,3)	(2,4)	(2,5)	(2,6)
(3,1)	(3,2)	(3,3)	(3,4)	(3,5)	(3,6)
(4,1)	(4,2)	(4,3)	(4,4)	(4,5)	(4,6)
(5,1)	(5,2)	(5,3)	(5,4)	(5,5)	(5,6)
(6,1)	(6,2)	(6,3)	(6,4)	(6,5)	(6,6)

2. Assume that the random variable X is the sum of the values shown after the throw of two dice. What would the probability mass function (PMF) f(x) be like?

Table of probability mass function (PMF) f(x) of rolling two fair dice

x	*f(x) = P(X=x)*
2	1/36
3	2/36
4	3/36
5	4/36
6	5/36
7	6/36
8	5/36
9	4/36
10	3/36
11	2/36
12	1/36

P(X=x) on the y-axis vs. x on the x-axis

3. Calculate the mean E(X) and variance V(X).

Discrete Random Variable X - sum of two rolling dice (mean and variance)
X	f(x)			x*f(x)	(x-µ)^2	(x-µ)^2*f(x)
2	1/36 =	0.02777778	2*1/36=	0.05555556	25	0.694444444
3	2/36 =	0.05555556	3*2/36=	0.16666667	16	0.888888889
4	3/36 =	0.08333333	4*3/36=	0.33333333	9	0.75
5	4/36 =	0.11111111	5*4/36=	0.55555556	4	0.444444444
6	5/36 =	0.13888889	6*5/36=	0.83333333	1	0.138888889
7	6/36 =	0.16666667	7*6/36=	1.16666667	0	0
8	5/36 =	0.13888889	8*5/36=	1.11111111	1	0.138888889
9	4/36 =	0.11111111	9*4/36=	1	4	0.444444444
10	3/36 =	0.08333333	10*3/36=	0.83333333	9	0.75
11	2/36 =	0.05555556	11*2/36=	0.61111111	16	0.888888889
12	1/36 =	0.02777778	12*1/36=	0.33333333	25	0.694444444

			µ=	7.00	sigma^2=	5.83

					sigma=	2.42

4. Please find the probability that the sum of rolling two fair dice is less than or equal to a certain value, fill out the following table, and draw the graph for F(x).

Cumulative Distribution Function of the sum of rolling two dice
x	F(x)=P(X≤x)=P(- ∞)+…+P(x-1)+P(x)	F(x)	F(x)	P(X>x)=1-F(x)
-2	P(- ∞)+…+P(-4)+P(-3)+P(-2))	0	0.0000	1.0000
-1	P(- ∞)+…+P(-3)+P(-2)+P(-1)	0	0.0000	1.0000
0	P(- ∞)+…+P(-2)+P(-1)+P(0)	0	0.0000	1.0000
1	P(- ∞)+…+P(-1)+P(0)+P(1)	0	0.0000	1.0000
2	P(- ∞)+…+P(0)+P(1)+P(2)	1/36	0.0278	0.9722
3	P(- ∞)+…+P(1)+P(2)+P(3)	1/36+2/36	0.0833	0.9167
4	P(- ∞)+…+P(2)+P(3)+P(4)	1/36+2/36+3/36	0.1667	0.8333
5	P(- ∞)+…+P(3)+P(4)+P(5)	1/36+2/36+3/36+4/36	0.2778	0.7222
6	P(- ∞)+…+P(4)+P(5)+P(6)	1/36+2/36+3/36+4/36+5/36	0.4167	0.5833
7	P(- ∞)+…+P(5)+P(6)+P(7)	1/36+2/36+3/36+4/36+5/36+6/36	0.5833	0.4167
8	P(- ∞)+…+P(6)+P(7)+P(8)	1/36+2/36+3/36+4/36+5/36+6/36+5/36	0.7222	0.2778
9	P(- ∞)+…+P(7)+P(8)+P(9)	1/36+2/36+3/36+4/36+5/36+6/36+5/36+4/36	0.8333	0.1667
10	P(- ∞)+…+P(8)+P(9)+P(10)	1/36+2/36+3/36+4/36+5/36+6/36+5/36+4/36+3/36	0.9167	0.0833
11	P(- ∞)+…+P(9)+P(10)+P(11)	1/36+2/36+3/36+4/36+5/36+6/36+5/36+4/36+3/36+2/36	0.9722	0.0278
12	P(- ∞)+…+P(10)+P(11)+P(12)	1/36+2/36+3/36+4/36+5/36+6/36+5/36+4/36+3/36+2/36+1/36	1.0000	0.0000
13	P(- ∞)+…+P(11)+P(12)+P(13)	1/36+2/36+3/36+4/36+5/36+6/36+5/36+4/36+3/36+2/36+1/36+0	1.0000	0.0000
14	P(- ∞)+…+P(12)+P(13)+P(14)	1/36+2/36+3/36+4/36+5/36+6/36+5/36+4/36+3/36+2/36+1/36+0+0	1.0000	0.0000
15	P(- ∞)+…+P(13)+P(14)+P(15)	1/36+2/36+3/36+4/36+5/36+6/36+5/36+4/36+3/36+2/36+1/36+0+0+0	1.0000	0.0000

Cumulative Distribution Function of the sum of rolling two dice

Confidence Interval

A machine is set up such that the average content of juice per bottle equals μ.

A sample of 100 bottles yields an average content of 48oz.

Calculate a 90% and a 95% confidence interval for the average content.

Assume that the population standard deviation σ = 5oz.

100(1-α)%	90%	95%	99%
	1.645	1.96	2.576

90%: 95%:

What sample size is required to make sure the margin of error (MOE) is within 0.5oz at the 95% confidence level? (±0.5 oz)

Assume that the population standard deviation σ = 5oz.

n = (1.96*5/0.5)²=368.64~369

Hypothesis Testing

A machine is set up such that the average content of juice per bottle equals μ. A sample of 36 bottles yields an average content of 51.5oz. Test the hypothesis that the average content per bottle is 50oz at the 5% significance level.

Assume that the population standard deviation σ = 5oz.

Classical approach:

Steps:

(a) Formulate H_o and H₁

H₀: μ=50      H₁: μ≠50

(b) Calculate the test statistic Z₀

(c)   For the two sided test, reject H₀ if Z₀>Z_α/2 or Z₀<-Z_α/2

Z_α/2 = Z_0.025=1.96

-1.96<1.8<1.96          -Z_α/2 <Z₀<Z_α/2     within the acceptance region and the null hypothesis cannot be rejected

P-value approach:

Steps:

(a) Formulate H_o and H₁

H₀: μ=50 H₁: μ≠50

(b) Calculate the test statistic Z₀

(c) For the two sided test, p = 2[1-Φ( Z₀)]=2*(1-0.9641)=2*0.0359=0.0718.

(d) P-value>0.05, the null hypothesis cannot be rejected