I need to get for each object the probability of its belonging to each of the clusters. For k-means, this can be done by calculating the distance of each object to the center of each cluster. The most distant point from the cluster belongs to it with an almost zero probability, the most approximate, with probability 1.

The problem is that linear functions (MinMaxScaler, for example) give the result, where objects almost everywhere have almost the same probability of belonging to each of the classes. (The initial data is a 8000х5000 matrix).

How to choose a nonlinearity, which would give a unit at the nearest point, say, 0.5 at the far point belonging to the cluster, and then began to fall sharply? How to automate this business in Python (it turns out about 25 clusters).

  • Got a response by reference . - Tolkachev Ivan

1 answer 1

It may be worth using the logistic function of the form:

 sigma(x) = 1/(1+exp(-x))