I think the Shapiro-Wilk test is a great way to see if a variable is normally distributed. This is an important assumption in creating any sort of model and also evaluating models.
Let’s look at how to do this in R!
And here is the output:
Shapiro-Wilk normality test data: data$CreditScore W = 0.96945, p-value = 0.2198
So how do we read this? It looks like the p-value is too high. But it is not. The threshold for the p-value is 0.05. So here we fail to reject the null hypothesis. We don’t have enough evidence to say the population is not normally distributed.
Let’s make a histogram to take a look using base R graphics:
hist(data$CreditScore, main="Credit Score", xlab="Credit Score", border="light blue", col="blue", las=1, breaks=5)
Our distribution likes nice here:
Great! I would feel comfortable making more assumptions and performing some tests.