Chapter 4 Structure of data

We will use mtcars to illustrate the structure of data.


4.1 Classify variables

  • Recall there are 2 main types of variables: qualitative and quantitative, which R calls Factor and num.

  • View the data.

mtcars
head(mtcars)
tail(mtcars, n=3)
help(mtcars)
  • Calculate the dimensions of the data set.
dim(mtcars)

This means that there are 32 rows (the types of cars) and 11 variables (properties of the cars).


  • List the names of the variables.
names(mtcars)


  • See how R has classified the variables by viewing the structure of the data.
str(mtcars)

where ‘num’ is a quantitative (numerical) variable and ‘Factor’ is a qualitative (categorical) variable.


4.2 Isolate a variable

  • Choose one variable from the data frame by using DataName$VariableName and store the result in a vector.
mpg= mtcars$mpg

Note that RStudio has code completion, so will auto-predict your commands. When you type mtcars$, the names of the all the variables will come up.


  • See the classification of 1 variable.
class(mpg)
str(mpg)


  • See the length of 1 variable.
length(mpg)


  • Calculate the sum of a (quantitative) variable.
sum(mpg)
  • If at any command you get the answer NA, it means that you need to specify what to do with missing values. See Resource on how to solve this.


  • Sort the data in increasing order.
sort(mpg)


  • Work out how to sort the data in decreasing order.
sort(mpg, decreasing = T)


  • Sum the 5 lowest values of the variable.
sum(sort(mpg)[1:5])


4.3 Select subset

  • Pick the 1st and 5th elements of the vector mpg
mpg[1]
mpg[5]
mpg[c(1,5)]
mtcars$mpg[c(1,5)]
mtcars[1,1]  
mtcars[5,1]   #mpg is 1st column.


4.4 Change classification

  • You may not agree with R’s initial classification, and want to change it.
str(mtcars)
  • For example, note that the number of carburetors carb is classified as num. Reclassify carb as a factor.
class(mtcars$carb)
carbF = factor(mtcars$carb)
class(carbF)
  • To change from a factor to a num:
ageCanVote = factor(setNames(c(16, 18, 18, "Unknown"), c("Austria", "Australia", "Afghanistam", "Zambia")))
as.numeric(ageCanVote)  # This is a mistake, as it converts to the rank of the factor level
as.numeric(as.character(ageCanVote))  # This converts properly

Note: (1) The warning message is not a problem - it is just alerting you to the introduction of NAs. (2) The mistake above if you just use as'numeric().