There is an area with a bunch of scattered points; we need to unite in the area those points that are at a distance less than a certain constant from each other.

Something like that:

enter image description here

  • The first question is the size of the heap. If these are orders of thousands of points, you can write a brute force. - gbg
  • The distance between the points is better to be taken as the square root of the sum of the squares of the difference - according to the Euclidean metric ru.wikipedia.org/wiki/… - gbg
  • this is understandable, but in my particular case, I need just the difference of coordinates. - Alexander Yuryevich
  • one
    Clustering the same! - VladD
  • 2
    What clustering algorithm with a previously unknown number of clusters would you advise? Perhaps there is a link to the page with a formal description? - Alexander Yurievich

2 answers 2

The use of the distance matrix as a solution to the clustering problem in many cases is justified, although it leads, in general, to redundant memory reservation due to the so-called data mirroring effect. However, consider this method.

In the question tags there is no mention of the implementation language, and therefore, probably, it is permissible to use any, in particular C ++.

To begin with, we will create a structure traditionally used in such cases (it is possible and a class, but there is no special sense):

struct Point { Point() : _x(0), _y(0) {} Point(int x, int y) : _x(x), _y(y) {} int _x, _y; }; 

We will experiment with the same points that are indicated in the question (of course, any others are possible):

 std::vector<Point> pnts; pnts.push_back(Point(45,45)); pnts.push_back(Point(0,0)); pnts.push_back(Point(21,21)); pnts.push_back(Point(29,29)); pnts.push_back(Point(5,5)); pnts.push_back(Point(33,33)); pnts.push_back(Point(40,40)); pnts.push_back(Point(80,80)); pnts.push_back(Point(20,20)); pnts.push_back(Point(2,2)); pnts.push_back(Point(25,25)); 

The distance matrix is ​​created by enumerating each point with each and saving, in fact, the distance between them:

 const int n = pnts.size(); std::vector<std::vector<int> > distances(n); for(int i = 0; i < n; ++i) { const Point &p1 = pnts.at(i); // По ΡƒΠΌΠΎΠ»Ρ‡Π°Π½ΠΈΡŽ заполняСм ΠΊΠ°ΠΆΠ΄ΡƒΡŽ ΠΊΠΎΠ»ΠΎΠ½ΠΊΡƒ нулями. std::vector<int> dist(n,0); for(int j = 0; j < n; ++j) { const Point &p2 = pnts.at(j); if(i == j) { // Если Ρ€Π΅Ρ‡ΡŒ ΠΎΠ± ΠΎΠ΄Π½ΠΎΠΉ ΠΈ Ρ‚ΠΎΠΉ ΠΆΠ΅ Ρ‚ΠΎΡ‡ΠΊΠ΅, // Ρ‚ΠΎ ΠΈ Π½Π΅Ρ‡Π΅Π³ΠΎ Π²Ρ‹Ρ‡ΠΈΡΠ»ΡΡ‚ΡŒ, расстояниС - 0. continue; } else if(i > j) { // Если ΡƒΠΆΠ΅ Ρ€Π°Π½Π΅Π΅ успСли Π²Ρ‹Ρ‡ΠΈΡΠ»ΠΈΡ‚ΡŒ расстояниС ΠΌΠ΅ΠΆΠ΄Ρƒ // двумя Ρ‚ΠΎΡ‡ΠΊΠ°ΠΌΠΈ, Ρ‚ΠΎ просто ΠΊΠΎΠΏΠΈΡ€ΡƒΠ΅ΠΌ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅. dist[j] = distances[j][i]; } else { // БобствСнно, вычислСниС расстояния. dist[j] = std::sqrt(std::pow(p1._x-p2._x,2) + std::pow(p1._y-p2._y,2)); } } distances[i] = dist; } 

The resulting matrix on the existing data set will look like this:

  0 63 33 22 56 16 7 49 35 60 28 63 0 29 41 7 46 56 113 28 2 35 33 29 0 11 22 16 26 83 1 26 5 22 41 11 0 33 5 15 72 12 38 5 56 7 22 33 0 39 49 106 21 4 28 16 46 16 5 39 0 9 66 18 43 11 07 56 26 15 49 9 0 56 28 53 21 49 113 83 72 106 66 56 0 84 110 77 35 28 1 12 21 18 28 84 0 25 7 60 2 26 38 4 43 53 110 25 0 32 28 35 5 5 28 11 21 77 7 32 0 

As is well seen, it is mirror, and therefore redundant, and it is this factor that can be the main reason for the impossibility of using the method in question.

I deliberately do not use double type for distance storage, since accuracy with an integer ( int ) is more than enough, plus some memory savings, plus this same vector can be easily used as an index matrix.

The matrix of indices serves exactly the task of reducing the number of iterations of an array of points. This is actually a map of indices, sorted by distance. Thus, it turns out that for each point the closest to it other points will always be in the first places.

 // Π—Π΄Π΅ΡΡŒ ΠΌΠΎΠΆΠ½ΠΎ ΡΡΠΊΠΎΠ½ΠΎΠΌΠΈΡ‚ΡŒ ΠΈ Π½Π΅ ΡΠΎΠ·Π΄Π°Π²Π°Ρ‚ΡŒ ΠΎΡ‚Π΄Π΅Π»ΡŒΠ½Ρ‹ΠΉ Π²Π΅ΠΊΡ‚ΠΎΡ€ // ΠΏΠΎΠ΄ ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Ρƒ индСксов, просто пСрСзаписав значСния Π²Π΅ΠΊΡ‚ΠΎΡ€Π° расстояний, // Ссли ΠΊΠΎΠ½Π΅Ρ‡Π½ΠΎ Π½Π΅ планируСтся Π΅Π³ΠΎ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ Π΄Π°Π»Π΅Π΅. std::vector<std::vector<int> > indexes(n); // Π¨Π°Π±Π»ΠΎΠ½ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ индСксов. std::vector<int> tpl(n); for(int i = 0; i < n; ++i) tpl[i] = i; for(int i = 0; i < n; ++i) { std::vector<int> vct = tpl; const std::vector<int> &dist = distances[i]; std::sort(vct.begin(), vct.end(), [&dist](int i1, int i2) { return dist[i1] < dist[i2]; }); indexes[i] = vct; } 

The result is the following set of indices of points:

  0 6 5 3 10 2 8 7 4 9 1 1 9 4 8 2 10 3 5 6 0 7 2 8 10 3 5 4 6 9 1 0 7 3 5 10 2 8 6 0 4 9 1 7 4 9 1 8 2 10 3 5 6 0 7 5 3 6 10 0 2 8 4 9 1 7 6 0 5 3 10 2 8 4 9 1 7 7 0 6 5 3 10 2 8 4 9 1 8 2 10 3 5 4 9 1 6 0 7 9 1 4 8 2 10 3 5 6 0 7 10 2 3 8 5 6 0 4 9 1 7 

The first column (index "0") of the matrix under consideration contains the indices of points in relation to themselves. The distance here is obviously zero and it is for this reason that after sorting the cluster formation cycle can be started from the column under the index "1".

The second column (index "1") is the index of the closest point with respect to the point with the index in the first column. Similarly, the contents of all subsequent columns, but with increasing distance.

Having on hand a matrix of sorted indexes, nothing more remains, how to proceed, in fact, to clustering.

The problem's author has a feature that requires consideration of different maxima for horizontal and vertical pixel distances:

 const int hrz_max = 10; const int vrt_max = 10; 

Clustering is convenient to perform recursively:

 void check(const std::vector<Point> &pnts , const std::vector<std::vector<int> > &indexes , int pnt_idx, std::vector<int> &d) { const Point &p1 = pnts[pnt_idx]; d.push_back(pnt_idx); // ... со Π²Ρ‚ΠΎΡ€ΠΎΠΉ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ. for(int i = 1, n = pnts.size(); i < n; ++i) { const int pnt2_idx = indexes[pnt_idx][i]; if(std::find(d.begin(),d.end(),pnt2_idx) != d.end()) continue; const Point &p2 = pnts[pnt2_idx]; if(std::abs(p1._x-p2._x) <= hrz_max && std::abs(p1._y-p2._y) <= vrt_max) { check(pnts, indexes, pnt2_idx, d); } else { // НСзачСм ΠΊΡ€ΡƒΡ‚ΠΈΡ‚ΡŒ Ρ†ΠΈΠΊΠ» Π΄Π°Π»Π΅Π΅, Ρ‚.ΠΊ. расстояния // отсортированы ΠΏΠΎ Π²ΠΎΠ·Ρ€Π°ΡΡ‚Π°Π½ΠΈΡŽ значСния. break; } } } 

Function Arguments:

const std::vector<Point> &pnts - source point vector;

const std::vector<std::vector<int> > &indexes - matrix of indices;

int pnt_idx - index of the point for which we are forming the cluster;

std::vector<int> &d is a vector that will contain indices of points of a single cluster.

    wrote my bike in javascript, can anyone else come in handy

     <script type="text/javascript"> /* псСвдокласс Ρ‚ΠΎΡ‡ΠΊΠΈ */ function Point(x,y) { this._x = x; this._y = y; } Point.prototype.getX = function() { return this._x; } Point.prototype.getY = function() { return this._y; } /* расчСты */ var points = new Array(); // массив Ρ‚ΠΎΡ‡Π΅ΠΊ points.push(new Point(45, 45)); points.push(new Point(0, 0)); points.push(new Point(21, 21)); points.push(new Point(29, 29)); points.push(new Point(5, 5)); points.push(new Point(33, 33)); points.push(new Point(40, 40)); points.push(new Point(80, 80)); points.push(new Point(20, 20)); points.push(new Point(2, 2)); points.push(new Point(25, 25)); const mX = 5; // максимальноС растояниС ΠΏΠΎ X const mY = 5; // максимальноС растояниС ΠΏΠΎ Y /* растояниС ΠΌΠ΅ΠΆΠ΄Ρƒ двумя Ρ‚ΠΎΡ‡ΠΊΠ°ΠΌΠΈ */ function calcDistance(p1, p2) { return Math.sqrt(Math.pow(p1.getX() - p2.getX(), 2) + Math.pow(p1.getY() - p2.getY(), 2)) } /* условия объСдинСния */ function isNear(p1, p2, mX, mY) { return Math.abs(p1.getX() - p2.getX()) <= mX && Math.abs(p1.getY() - p2.getY()) <= mY ? true : false; } /* расчСт ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Ρ‹ расстояний */ function calcMasDistances(mas) { var l = mas.length; var r = []; for (var i1 = 0; i1 < l; i1++) { r[i1] = []; for (var i2 = 0; i2 < l; i2++) { if (i1 == i2) { r[i1][i2] = 0; } else if (i1 < i2) { r[i1][i2] = calcDistance(mas[i1], mas[i2]); } else { r[i1][i2] = r[i2][i1]; } } } return r; } /* смСна элСмСнтов массива мСстами */ function swap(n1, n2, mas) { var x; x = mas[n1]; mas[n1] = mas[n2]; mas[n2] = x; return mas; } /* расчСт ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Ρ‹ индСксов близости */ function calcMasIndexes(mas) { var l = mas.length; var r = []; var n = []; for (var i = 0; i < l; i++) { n[i] = i; } for (var i = 0; i < l; i++) { var ns = n.slice(); var ds = mas[i].slice(); for (var i1 = 0; i1 < l - 1; i1++) { for (var i2 = i1 + 1; i2 < l; i2++) { if (ds[i1] > ds[i2]) { ns = swap(i1, i2, ns); ds = swap(i1, i2, ds); } } } r[i] = ns.slice(); } return r; } /* сущСствованиС элСмСнта Π² массивС */ function exist(n, mas) { for (var i = 0; i < mas.length; i++) { if (n == mas[i]) { return true; } } return false; } /* кластСризация */ function clastering(points, indexes, used, pn, cl) { if (!used[pn]) { cl[cl.length] = pn; used[pn] = true; for (var i = 1; i < points.length; i++) { var pn2 = indexes[pn][i]; if (exist(pn2, cl)) { continue; } // условиС Π²Ρ‹Ρ…ΠΎΠ΄Π° ΠΈΠ· Ρ†ΠΈΠΊΠ»Π° if(isNear(points[pn], points[pn2], mX, mY)) { clastering(points, indexes, used, pn2, cl); } else { break; } } } } var distances = calcMasDistances(points); // ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Π° растояний var indexes = calcMasIndexes(distances); // ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Π° индСксов var used = []; // массив задСйствованных элСмСнтов for (var i = 0; i < points.length; i++) { used[i] = false; } var clasters = []; // ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Π° кластСров /* кластСризация */ for (var i = 0; i < points.length; i++) { var cl = []; clastering(points, indexes, used, i, cl); if (cl.length > 0) { clasters.push(cl.slice()); } } /* Π²Ρ‹Π²ΠΎΠ΄ */ distances.forEach(function(item1, i1, distances) { item1.forEach(function(item2, i2, item1) { document.write(Math.round(parseFloat(item2)) + (i2 != item1.length - 1 ? ',' : '')); }); document.write('</br>'); }); document.write('<hr>'); indexes.forEach(function(item1, i1, indexes) { document.write(item1 + '</br>'); }); document.write('<hr>'); clasters.forEach(function(item1, i1, clasters) { document.write(item1 + '</br>'); }); </script> 

    Conclusion:

    distance matrix

     0,64,34,23,57,17,7,49,35,61,28 64,0,30,41,7,47,57,113,28,3,35 34,30,0,11,23,17,27,83,1,27,6 23,41,11,0,34,6,16,72,13,38,6 57,7,23,34,0,40,49,106,21,4,28 17,47,17,6,40,0,10,66,18,44,11 7,57,27,16,49,10,0,57,28,54,21 49,113,83,72,106,66,57,0,85,110,78 35,28,1,13,21,18,28,85,0,25,7 61,3,27,38,4,44,54,110,25,0,33 28,35,6,6,28,11,21,78,7,33,0 

    proximity matrix

     0,6,5,3,10,2,8,7,4,9,1 1,9,4,8,2,10,3,5,6,0,7 2,8,10,3,5,4,9,6,1,0,7 3,5,10,2,8,6,0,4,9,1,7 4,9,1,8,2,10,3,5,6,0,7 5,3,6,10,0,2,8,4,9,1,7 6,0,5,3,10,2,8,4,9,7,1 7,0,6,5,3,10,2,8,4,9,1 8,2,10,3,5,4,9,6,1,0,7 9,1,4,8,2,10,3,5,6,0,7 10,3,2,8,5,6,0,4,9,1,7 

    cluster matrix

     0,6 1,9,4 2,8,10,3,5 7