Unsupervised Machine Learning in R: K-Means

K-Means clustering is unsupervised machine learning because there is not a target variable. Clustering can be used to create a target variable, or simply group data by certain characteristics.

Here’s a great and simple way to use R to find clusters, visualize and then tie back to the data source to implement a marketing strategy.

setwd
#import dataset
ABC <-read.table("AbcBank.csv",header=TRUE, 
                  sep=",")

#choose variables to be clustered 
# make sure to exclude ID fields or Dates
ABC_num<- ABC[,2:5]
#scale the data! so they are all normalized 
ABC_scaled <-as.data.frame(scale(ABC_num))

#kmeans function
k3<- kmeans(ABC_scaled, centers=3, nstart=25)
#library with the visualization
library(factoextra)
fviz_cluster(k3, data=ABC_scaled,
             ellipse.type="convex",
             axes =c(1,2),
             geom="point",
             label="none",
             ggtheme=theme_classic())
#check out the centers 
# remember these are normalized but 
#higher values are higher values for the original data
    k3$centers          
#add the cluster to the original dataset!
    ABC$Cluster<-as.numeric(k3$cluster)
    

Check out our awesome clusters:

Repo here with dataset: https://github.com/emileemc/kmeans

Easy R: Summary statistics grouping by a categorical variable

Once I found this great R package that really improves on the dplyr summary() function it was a game changer.

This library allows for the best summary statistics for each variable grouped by a categorical variable. It can also be saved as a list with an assignment.

library(purrr)
credit %>% split(credit$Date) %>% map(summary)

Simply use datatable$column that is the categorical variable then use the map function to run summary. And that’s it! All set to produce results like these:

$Aug
   Homeowner       Credit.Score   Years.of.Credit.History
 Min.   :0.0000   Min.   :485.0   Min.   : 2.00          
 1st Qu.:0.0000   1st Qu.:545.5   1st Qu.: 5.50          
 Median :0.0000   Median :591.0   Median : 9.00          
 Mean   :0.3704   Mean   :601.6   Mean   :10.33          
 3rd Qu.:1.0000   3rd Qu.:630.0   3rd Qu.:14.50          
 Max.   :1.0000   Max.   :811.0   Max.   :22.00          
                                                         
 Revolving.Balance Revolving.Utilization    Approval        Loan.Amount
 $2,000  : 2       100%   : 3            Min.   :0.0000   $11,855 : 1  
 $27,000 : 2       65%    : 2            1st Qu.:0.0000   $12,150 : 1  
 $29,100 : 2       70%    : 2            Median :0.0000   $13,054 : 1  
 $1,000  : 1       78%    : 2            Mean   :0.1481   $15,451 : 1  
 $10,500 : 1       79%    : 2            3rd Qu.:0.0000   $16,218 : 1  
 $12,050 : 1       85%    : 2            Max.   :1.0000   $17,189 : 1  
 (Other) :18       (Other):14                             (Other) :21  
   Date    Default
 Aug :27   0:14   
 July: 0   1:13   
                  
                                         
$July
   Homeowner       Credit.Score   Years.of.Credit.History
 Min.   :0.0000   Min.   :620.0   Min.   : 2.0           
 1st Qu.:0.5000   1st Qu.:682.5   1st Qu.: 8.0           
 Median :1.0000   Median :701.0   Median :12.0           
 Mean   :0.7391   Mean   :711.8   Mean   :12.3           
 3rd Qu.:1.0000   3rd Qu.:746.5   3rd Qu.:16.5           
 Max.   :1.0000   Max.   :802.0   Max.   :24.0           
                                                         
 Revolving.Balance Revolving.Utilization    Approval        Loan.Amount
 $11,200 : 2       11%    : 2            Min.   :0.0000   $3,614  : 2  
 $11,700 : 2       15%    : 2            1st Qu.:1.0000   $12,303 : 1  
 $6,100  : 2       20%    : 2            Median :1.0000   $12,338 : 1  
 $10,000 : 1       5%     : 2            Mean   :0.8261   $12,712 : 1  
 $10,500 : 1       7%     : 2            3rd Qu.:1.0000   $13,020 : 1  
 $11,320 : 1       70%    : 2            Max.   :1.0000   $17,697 : 1  
 (Other) :14       (Other):11                             (Other) :16  
   Date    Default
 Aug : 0   0:10   
 July:23   1:13   

You’ll have to do some formatting, or export to excel ! So fast and easy with this one.