Chapter 13 Test for a proportion (using simulation)
Suppose we have a sample of size \(n\) from a population with proportion \(p\) of a certain trait.
We want to test a null hypothesis for the value of \(p\), by modelling \(H_{0}\) by a box model (with “1” representing the trait) with \(n\) draws.
We simulate samples from the box model, and then compare our actual sample to these results - ie how common is our sample?
13.1 Simple balanced box
We want to test \(H_{0}\): \(p = 0.5\).
We can produce a picture of the box model modelling \(H_{0}\) **.
library("DiagrammeR")
DiagrammeR::grViz("
digraph rmarkdown {
graph [fontsize = 16, fontname = Arial, nodesep = .1, ranksep = .8]
node [fontsize = 16, fontname = Arial, fontcolor = White]
edge [fontsize = 12, fontname = Arial, width = 2]
Box [shape=oval,style=filled, color=SteelBlue3,width=5, label='1 0']
Sample [shape=oval, style=filled, color=SteelBlue2, label='']
Box -> Sample [label=' n draws']
}
")
detach(package:DiagrammeR)
- Now simulate draws from the box, and compare to your sample. Here, suppose that \(n=20\), and choose a simulation size of 100.
set.seed(1)
# Define box (modelling Ho)
box=c(0,1)
# # Simulate 100 samples of size 20 from the box
totals = replicate(100, sum(sample(box, 20, rep = T)))
table(totals)
hist(totals)
13.2 Unbalanced box
We want to test \(H_{0}\): \(p = 0.2\).
We can produce a picture of the box model modelling \(H_{0}\) **.
library("DiagrammeR")
DiagrammeR::grViz("
digraph rmarkdown {
graph [fontsize = 16, fontname = Arial, nodesep = .1, ranksep = .8]
node [fontsize = 16, fontname = Arial, fontcolor = White]
edge [fontsize = 12, fontname = Arial, width = 2]
Box [shape=oval,style=filled, color=SteelBlue3,width=5, label='100p x 1 100(1-p) x 0']
Sample [shape=oval, style=filled, color=SteelBlue2, label='']
Box -> Sample [label=' n draws']
}
")
detach(package:DiagrammeR)
- Now simulate draws from the box, and compare to your sample. Here, suppose that \(n=30\), and choose a simulation size of 1000
set.seed(1)
# Define box (modelling Ho)
box=c(0,1)
# Simulate 1000 samples of size 30 from the box
totals = replicate(1000, sum(sample(box, 30, prob=c(0.8,0.2), rep = T)))
table(totals)
hist(totals)