Chapter 6 Numerical Summaries
The numerical summary must match up with the type of variable(s).
Variable | Type of summary |
---|---|
1 Qualitative | frequency table, most common category |
1 Quantitative | mean, median, SD, IQR etc |
2 Qualitative | contingency table |
2 Quantitative | correlation, linear model |
1 Quantitative, 1 Qualitative | mean, median, SD, IQR etc across categories |
We’ll keep working with the mtcars
dataset.
So again remind yourself what it is like.
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
## [1] 32 11
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
## starting httpd help server ... done
6.1 Frequency and contingency tables
- A frequency table summarises 1 qualitative variable.
- A contingency table summarises 2 qualitative variable.
6.2 Mean and median
- The mean and median measure centre for quantitative variables.
6.3 Standard deviation (SD)
The standard deviation measures spread for quantitative variables.
The
sd
command calculates the sample standard deviation. The squared SD is the variance.
- The
popsd
command calculates the population standard deviation, but requires themulticon
package.
#install.packages(multicon) # a package only needs to be installed once.
library(multicon)
popsd(mtcars$gear)
# Longer way
N = length(mtcars$gear)
sd(mtcars$gear)*sqrt((N-1)/N)
- Note: When we model a population by the box model [Section 8 and following], we will require the population SD.
6.4 Interquartile range (IQR)
- The quickest method is to use
IQR
.
- There are lots of different methods of working out the quartiles. We can use the
quantile
command, and then work out the IQR.
What is the 50% quantile equivalent to?
6.5 Summary
- The numerical summaries for quantitative variables can all be produced with
summary
, which is an expanded version of the 5 number summary. Sometimes these values will vary from usingquantile
as there are different conventions for calculating quartiles.
- We can consider a subset of the data. Here, we choose the mpg of cars which have a weight greater or equal to 3.
Here we take all the data from mtcars dataset for a specific cylinder e.g. 6.