Shapiro-Wilk Test for Normality in R

I think the Shapiro-Wilk test is a great way to see if a variable is normally distributed. This is an important assumption in creating any sort of model and also evaluating models.

Let’s look at how to do this in R!

shapiro.test(data$CreditScore)

And here is the output:

Shapiro-Wilk normality test
data:  data$CreditScore
W = 0.96945, p-value = 0.2198

So how do we read this? It looks like the p-value is too high. But it is not. The threshold for the p-value is 0.05. So here we fail to reject the null hypothesis. We don’t have enough evidence to say the population is not normally distributed.

Let’s make a histogram to take a look using base R graphics:

hist(data$CreditScore, 
     main="Credit Score", 
     xlab="Credit Score", 
     border="light blue", 
     col="blue", 
     las=1, 
     breaks=5)

Our distribution likes nice here:

Great! I would feel comfortable making more assumptions and performing some tests.

5 thoughts on “Shapiro-Wilk Test for Normality in R

  1. I should advise to discuss Royston’s extension of SW test if the sample size is smaller than 50.

    Shapiro-Wilk test and Anderson-Darling test have better power for a given significance compared to Kolmogorov-Smirnov or Lilliefors test (an adaptation of the Kolmogorov–Smirnov test)

    Often normality tests are applied to independent variables (predictors) although most statistical models, like regression analyses, make no strong assumptions regarding predictors, but rather strongly regarding Differences (Bland&Altman plot) or Residuals.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s