Chapter 14 Tests for a mean
Suppose we have a sample of size \(n\) from a population with unknown \(\mu\).
We want to test a null hypothesis for the value of \(\mu\), by modelling \(H_{0}\) by a box model with \(n\) draws.
14.1 1 sample t-test
In 2013, the average miles per gallon (mpg) for cars sold in the US was 23.6. We want to see if this is a significant difference from the mtcars
data.
14.1.1 H: Hypotheses
Formally, we write \(H_0: μ=23.6\) vs \(H_1: μ \neq 23.6\).
Have a quick look to see if \(H_0: μ=23.6\) seems to fit with data (eg right units and size etc).
- Now we model \(H_0: μ=23.6\) by a box with population mean 23.6, giving the EV of the sample mean as 23.6. We compare this to the observed sample mean of 20.1.
14.1.2 A: Check the Assumptions
For a \(t\) test (or \(Z\) test), we need to check the assumption of normality.
- First, look at the shape of the data.
Notice the boxplot looks symmetric, which is consistent with normality. Note the histogram shows some light right skewing.
- Next, try some more formal diagnostics.
Check to see if the Q-Q Plot looks linear, as this indicates normality. The Shapiro test tests the null hypothesis that the sample comes from a Normal distribution. Here the p-value is quite big (0.1229) hence we would retain \(H_0\) which suggests normality.
- Note: you can combine all 4 graphics in 2x2 window, for easy comparison.
14.1.3 T: Calculate the Test Statistic
The formula for the \(t\)-test statistic is \(t_{obs} = \frac{\mbox{observed value - hypothesised value}}{\mbox{standard error}}\).
We use the SD of the data as an approximation of the SD of the population, so we can calculate the SE of the Sample Mean.
- The degrees of freedom of the \(T\) test statistic is \(\mbox{sample size} - 1\).
14.1.4 P: Calculate the p-value
- The p-value is the probability of getting a value of \(t_{obs}\) or more extreme in either tail of the \(T\) distribution.
- Note the use of
abs
here, allows for test statistic to be in either tail.
14.1.5 C: Conclusion
- We compare the p-value (0.002476) to the significance level (0.05), and so we reject the null hypothesis. This is the statistical conclusion.
- We then write a context specific conclusion on cars: ie the mpg of current cars appears to have changed from the US older cars in
mtcars
.
14.2 The speedy way!
- We can do all this quickly in R!
- Match up this output with the calculations above. Note it also gives us the 95% CI and the mean of the sample.
14.3 1 sample Z-Test**
- The \(Z\)-Test is not in base R. This is because the \(Z\)-Test requires us to know the population variance \(\sigma\). Hence the \(t\)-Test is much more common.
- However, we could create a function called
z.test
ourselves. - Assume the population variance of
mtcars$mpg
is the same as the sample variance (this of course is only a guess).
v = var(mtcars$mpg)
# This creates a function called z.test, with data, mu and var as the inputs, and zobs as the output.
z.test = function(data, mu, var){
zobs = (mean(data) - mu) / (sqrt(var / length(data)))
return(zobs)
}
# Run the function with inputs
z.test(mtcars$mpg,23.6,v)
- Hence, the observed value of \(Z\) Test is -3.293877, which will give a tiny p-value and it is more than 3 standard deviations away from the mean.
- Hence again we conclude that the mpg of current cars appears to have changed significantly from the US older cars in
mtcars
.