[Solved] Understanding kmeans clustering in r [closed]


1. What does the cluster sizes mean?

You provided 16 records and told kmeans to find 3 clusters. It clustered those 16 records into 3 groups of A: 3 records, B: 10 records and C: 3 records.

2. What are the cluster means?

These numbers signify the location in N-Dimensional space of the centroid (the “mean”) of each cluster. You have three clusters, so you have three means. You have three dimensions (“google”, “stackoverflow”, “tester”) so you get a value in each dimension. Reading the numbers across the row gives the location of a single centroid.

3. What is the Clustering vector?

This is the cluster label the algorithm is giving each record you passed the algorithm. Remember how earlier I said there were 3 clusters of size 3, 10, and 3? These clusters are labeled as 1, 2 and 3, and the algorithm stores the cluster label for each record in this vector. Here, you can see that there are 3 “1”s, 10 “2”s, and 3 “3”s. Does that make sense?

4. What are between_SS & total_SS?

This is notation generally used in ANOVA. You might find this helpful: http://www-ist.massey.ac.nz/dstirlin/CAST/CAST/HrandBlock/randBlock7.html

4

solved Understanding kmeans clustering in r [closed]