Hello. There are two random variables. The task is to try to classify the data using the k-means method. Split the data into three clusters. The results surprised me.

enter image description here

Why is the part of the data belonging to cluster 3 (blue dots) surrounded by dots from 2 clusters (green dots)? In the picture you can see it in the lower left corner if you take a good look.

    1 answer 1

    This is a normal situation. That is not quite normal, but this clustering method sometimes leads to such strange results. The fact is that distances are calculated from centroids, and not from neighboring points. More information about this and other nuances can be found at http://dungba.org/the-strange-effect-of-k-means/

    • Great! I will read. Thanks for the useful information. But with the following inertia, this effect may disappear. That is, k-means classifies normally. This is normal? That is, I want to ask: will the k-means be banished as many times as necessary until such a partition is given that will satisfy you? - Dmitry
    • one
      k-means does not classify anything, since it is not a classification method. And clusters really turn out to be different when restarted, therefore it is considered normal to repeat the procedure several times. But if the pictures are radically different, then it is better not to choose the one you like, but to recognize that it was impossible to divide the data into clusters in this way. - Ogurtsov