Chapter 6 Numerical Summaries
The numerical summary must match up with the type of variable(s).
Variable | Type of summary |
---|---|
1 Qualitative | frequency table, most common category |
1 Quantitative | mean, median, SD, IQR etc |
2 Qualitative | contingency table |
2 Quantitative | correlation, linear model |
1 Quantitative, 1 Qualitative | mean, median, SD, IQR etc across categories |
We’ll keep working with the mtcars
dataset.
So again remind yourself what it is like.
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
## [1] 32 11
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
## starting httpd help server ... done
6.1 Frequency and contingency tables
- A frequency table summarises 1 qualitative variable.
- A contingency table summarises 2 qualitative variable.
6.3 Standard deviation (SD)
The standard deviation measures spread for quantitative variables.
The
sd
command calculates the sample standard deviation. The squared SD is the variance.
- The
popsd
command calculates the population standard deviation, but requires themulticon
package.
- Note: When we model a population by the box model [Section 8 and following], we will require the population SD.
6.4 Interquartile range (IQR)
- The quickest method is to use
IQR
.
- There are lots of different methods of working out the quartiles. We can use the
quantile
command, and then work out the IQR.
What is the 50% quantile equivalent to?
6.5 Summary
- The numerical summaries for quantitative variables can all be produced with
summary
, which is an expanded version of the 5 number summary. Sometimes these values will vary from usingquantile
as there are different conventions for calculating quartiles.
- We can consider a subset of the data. Here, we choose the mpg of cars which have a weight greater or equal to 3.
Here we take all the data from mtcars dataset for a specific cylinder e.g. 6.