Z Scores

Z Scores provide a standardized way to understand the distribution of data. The calculation of a Z Score is simple and can be calculated by hand.

The Z Score explains how much above or below a given value is from the mean in a given set of data.

In the example below, we will calculate the z score for women’s height in the women dataset from the {datasets} package. The women dataset consists of 15 observations and two variables. We will focus on the height variable which is a numeric variable and measures height in inches.

To calculate a z score, we need to calculate the mean and standard deviation of the height in these data using the following formula:

\[ Z = (X - \mu) / \sigma\]

Where:

  • X is a single raw data point

  • \(\mu\) is the mean of the variable in our dataset

  • \(\sigma\) is the standard deviation of the variable in our dataset

women %>% 
  select(height) %>% 
  mutate(z_score_height = (height - mean(height))/sd(height))
##    height z_score_height
## 1      58     -1.5652476
## 2      59     -1.3416408
## 3      60     -1.1180340
## 4      61     -0.8944272
## 5      62     -0.6708204
## 6      63     -0.4472136
## 7      64     -0.2236068
## 8      65      0.0000000
## 9      66      0.2236068
## 10     67      0.4472136
## 11     68      0.6708204
## 12     69      0.8944272
## 13     70      1.1180340
## 14     71      1.3416408
## 15     72      1.5652476

Now we have a Z Score for each measure of height. If we calculated the mean and standard deviations of the Z Scores we would find that the mean is zero and the standard deviation is one.

We can see that 65 inches is the mean women’s height in our data. Sixty inches is slightly less than one standard deviation below the mean, 70 inches is slightly more than one standard deviation above the mean.