# Shapiro-Wilk Test for Normality in R

I think the Shapiro-Wilk test is a great way to see if a variable is normally distributed. This is an important assumption in creating any sort of model and also evaluating models.

Let’s look at how to do this in R!

```shapiro.test(data\$CreditScore)
```

And here is the output:

```Shapiro-Wilk normality test
data:  data\$CreditScore
W = 0.96945, p-value = 0.2198
```

So how do we read this? It looks like the p-value is too high. But it is not. The threshold for the p-value is 0.05. So here we fail to reject the null hypothesis. We don’t have enough evidence to say the population is not normally distributed.

Let’s make a histogram to take a look using base R graphics:

```hist(data\$CreditScore,
main="Credit Score",
xlab="Credit Score",
border="light blue",
col="blue",
las=1,
breaks=5)
```

Our distribution likes nice here:

Great! I would feel comfortable making more assumptions and performing some tests.

# Dollar Signs and Percentages- 3 Different Ways to Convert Data Types in R

Working with percentages in R can be a little tricky, but it’s easy to change it to an integer, or numeric, and run the right statistics on it. Such as quartiles and mean and not frequencies.

```data\$column = as.integer(sub("%", "",data\$column))

```

Essentially you are using the sub function and substituting the “%” for a blank. You don’t lose any decimals either! So in the end just remember that those are percentage amounts.

Next example is converting to a factor

```data\$column = as.factor(data\$column)
```

Now you can read the data as discrete. This is great for categorical and nominal level variables.

Last example is converting to numeric. If you have a variable that has a dollar sign use this to change it to a number.

```data\$balance = as.factor(gsub(",", "", data\$balance))
data\$balance = as.numeric(gsub("\\\$", "", data\$balance))
```

Check out the before

```Balance   : Factor w/ 40 levels "\$1,000","\$10,000",..:
Utilization  : Factor w/ 31 levels "100%","11%","12%",
```

And after

```Balance      : num  11320 7200 20000 12800 5700 ...
Utilization  : int  25 70 55 65 75
```

I hope this helps you with your formatting times! So simple and easy and you’ll be able to summarize your data!