Chapter 4 Structure of data
We will use mtcars
to illustrate the structure of data.
4.1 Classify variables
Recall there are 2 main types of variables: qualitative and quantitative, which R calls
Factor
andnum
.View the data.
- Calculate the
dim
ensions of the data set.
This means that there are 32 rows (the types of cars) and 11 variables (properties of the cars).
- List the
names
of the variables.
- See how R has classified the variables by viewing the
str
ucture of the data.
where ‘num’ is a quantitative (numerical) variable and ‘Factor’ is a qualitative (categorical) variable.
4.2 Isolate a variable
- Choose one variable from the data frame by using
DataName$VariableName
and store the result in a vector.
Note that RStudio has code completion, so will auto-predict your commands. When you type mtcars$
, the names of the all the variables will come up.
- See the
class
ification of 1 variable.
- See the
length
of 1 variable.
- Calculate the
sum
of a (quantitative) variable.
- If at any command you get the answer NA, it means that you need to specify what to do with missing values. See Resource on how to solve this.
Sort
the data in increasing order.
- Work out how to sort the data in decreasing order.
- Sum the 5 lowest values of the variable.
4.3 Select subset
- Pick the 1st and 5th elements of the vector
mpg
4.4 Change classification
- You may not agree with R’s initial classification, and want to change it.
- For example, note that the number of carburetors
carb
is classified asnum
. Reclassifycarb
as afactor
.
- To change from a
factor
to anum
:
ageCanVote = factor(setNames(c(16, 18, 18, "Unknown"), c("Austria", "Australia", "Afghanistam", "Zambia")))
as.numeric(ageCanVote) # This is a mistake, as it converts to the rank of the factor level
as.numeric(as.character(ageCanVote)) # This converts properly
Note:
(1) The warning message is not a problem - it is just alerting you to the introduction of NAs.
(2) The mistake above if you just use as'numeric()
.