Chapter 13 Test for a proportion (using simulation)

  • Suppose we have a sample of size n from a population with proportion p of a certain trait.

  • We want to test a null hypothesis for the value of p, by modelling H0 by a box model (with “1” representing the trait) with n draws.

  • We simulate samples from the box model, and then compare our actual sample to these results - ie how common is our sample?


13.1 Simple balanced box

  • We want to test H0: p=0.5.

  • We can produce a picture of the box model modelling H0 **.

library("DiagrammeR")
  
DiagrammeR::grViz(" 
digraph rmarkdown {
  
graph [fontsize = 16, fontname = Arial, nodesep = .1, ranksep = .8]
node [fontsize = 16, fontname = Arial, fontcolor = White]
edge [fontsize = 12, fontname = Arial, width = 2]

Box [shape=oval,style=filled, color=SteelBlue3,width=5, label='1    0']

Sample [shape=oval, style=filled, color=SteelBlue2, label='']

Box -> Sample [label='   n draws']

}
")
detach(package:DiagrammeR)
  • Now simulate draws from the box, and compare to your sample. Here, suppose that n=20, and choose a simulation size of 100.
set.seed(1)

# Define box (modelling Ho)
box=c(0,1)

# # Simulate 100 samples of size 20 from the box
totals = replicate(100, sum(sample(box, 20, rep = T)))
table(totals)
hist(totals)


13.2 Unbalanced box

  • We want to test H0: p=0.2.

  • We can produce a picture of the box model modelling H0 **.

library("DiagrammeR")
  
DiagrammeR::grViz(" 
digraph rmarkdown {
  
graph [fontsize = 16, fontname = Arial, nodesep = .1, ranksep = .8]
node [fontsize = 16, fontname = Arial, fontcolor = White]
edge [fontsize = 12, fontname = Arial, width = 2]

Box [shape=oval,style=filled, color=SteelBlue3,width=5, label='100p x 1    100(1-p) x 0']

Sample [shape=oval, style=filled, color=SteelBlue2, label='']

Box -> Sample [label='   n draws']

}
")
detach(package:DiagrammeR)
  • Now simulate draws from the box, and compare to your sample. Here, suppose that n=30, and choose a simulation size of 1000
set.seed(1)

# Define box (modelling Ho)
box=c(0,1)

# Simulate 1000 samples of size 30 from the box
totals = replicate(1000, sum(sample(box, 30, prob=c(0.8,0.2), rep = T)))
table(totals)
hist(totals)