Chapter 14 Tests for a mean

  • Suppose we have a sample of size \(n\) from a population with unknown \(\mu\).

  • We want to test a null hypothesis for the value of \(\mu\), by modelling \(H_{0}\) by a box model with \(n\) draws.



14.1 1 sample t-test

In 2013, the average miles per gallon (mpg) for cars sold in the US was 23.6. We want to see if this is a significant difference from the mtcars data.


14.1.1 H: Hypotheses

  • Formally, we write \(H_0: μ=23.6\) vs \(H_1: μ \neq 23.6\).

  • Have a quick look to see if \(H_0: μ=23.6\) seems to fit with data (eg right units and size etc).

mean(mtcars$mpg)  # sample mean
length(mtcars$mpg) # size of sample 
  • Now we model \(H_0: μ=23.6\) by a box with population mean 23.6, giving the EV of the sample mean as 23.6. We compare this to the observed sample mean of 20.1.


14.1.2 A: Check the Assumptions

For a \(t\) test (or \(Z\) test), we need to check the assumption of normality.

  • First, look at the shape of the data.
hist(mtcars$mpg)
boxplot(mtcars$mpg)

Notice the boxplot looks symmetric, which is consistent with normality. Note the histogram shows some light right skewing.


  • Next, try some more formal diagnostics.
qqnorm(mtcars$mpg)
shapiro.test(mtcars$mpg)

Check to see if the Q-Q Plot looks linear, as this indicates normality. The Shapiro test tests the null hypothesis that the sample comes from a Normal distribution. Here the p-value is quite big (0.1229) hence we would retain \(H_0\) which suggests normality.


  • Note: you can combine all 4 graphics in 2x2 window, for easy comparison.
par(mfrow=c(2,2))
qqnorm(mtcars$mpg)
shapiro.test(mtcars$mpg)
hist(mtcars$mpg)
boxplot(mtcars$mpg)


14.1.3 T: Calculate the Test Statistic

  • The formula for the \(t\)-test statistic is \(t_{obs} = \frac{\mbox{observed value - hypothesised value}}{\mbox{standard error}}\).

  • We use the SD of the data as an approximation of the SD of the population, so we can calculate the SE of the Sample Mean.

tobs = (mean(mtcars$mpg)-23.6)/(sd(mtcars$mpg)/sqrt(32))


  • The degrees of freedom of the \(T\) test statistic is \(\mbox{sample size} - 1\).
length(mtcars$mpg)-1


14.1.4 P: Calculate the p-value

  • The p-value is the probability of getting a value of \(t_{obs}\) or more extreme in either tail of the \(T\) distribution.
2*(1-pt(abs(tobs),31))
  • Note the use of abs here, allows for test statistic to be in either tail.


14.1.5 C: Conclusion

  • We compare the p-value (0.002476) to the significance level (0.05), and so we reject the null hypothesis. This is the statistical conclusion.
  • We then write a context specific conclusion on cars: ie the mpg of current cars appears to have changed from the US older cars in mtcars.


14.2 The speedy way!

  • We can do all this quickly in R!
t.test(mtcars$mpg, mu = 23.6)
  • Match up this output with the calculations above. Note it also gives us the 95% CI and the mean of the sample.


14.3 1 sample Z-Test**

  • The \(Z\)-Test is not in base R. This is because the \(Z\)-Test requires us to know the population variance \(\sigma\). Hence the \(t\)-Test is much more common.
  • However, we could create a function called z.test ourselves.
  • Assume the population variance of mtcars$mpg is the same as the sample variance (this of course is only a guess).
v = var(mtcars$mpg)
# This creates a function called z.test, with data, mu and var as the inputs, and zobs as the output.
z.test = function(data, mu, var){
   zobs = (mean(data) - mu) / (sqrt(var / length(data)))
   return(zobs)
}
# Run the function with inputs
z.test(mtcars$mpg,23.6,v)
  • Hence, the observed value of \(Z\) Test is -3.293877, which will give a tiny p-value and it is more than 3 standard deviations away from the mean.
2*(1-pnorm(abs(-3.293877)))
  • Hence again we conclude that the mpg of current cars appears to have changed significantly from the US older cars in mtcars.