A dictionary is given, where different keys correspond to different size vectors. It is necessary to find the difference between each value of one key and each value of the others. The structure of the dictionaries varies depending on the incoming dataset. Ie. The algorithm should work with different sizes of values. More precisely, the dictionary stores the indices of points that are centroids, and the values of the vectors - indices of points that belong to this cluster. If this information somehow helps :) Tried to use broadcasting. It worked on a toy example, but not on a real one.
in: a = [5, 6, 6, 6, 7] b = [4,3,4,5,6,7,8,9,0,5,2,46] Centroids = defaultdict(list) for i, j in zip(a, b): Centroids[i].append(j) Centroids out: defaultdict(list, {5: [4], 6: [3, 4, 5], 7: [6]}) in: k = [] for i in list(Centroids.values()): k.append(np.array(i)) print(np.array(k)) a = [] for i in range(3): for j in range(3): a.append(k[i] - k[j]) print(a) out: [array([4]) array([3, 4, 5]) array([6])] [array([0]), array([ 1, 0, -1]), array([-2]), array([-1, 0, 1]), array([0, 0, 0]), array([-3, -2, -1]), array([2]), array([3, 2, 1]), array([0])] This is an example that did not work:
in: k = centers(data) out: [array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 64, 65, 67, 68, 69, 70, 71, 72, 75]), array([21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40]), array([41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 73, 74]), array([59, 60, 63, 66]), array([61, 62])] in: a = [] for i in range(5): for j in range(5): a.append(k[i] - k[j]) Departure error:
ValueError: operands could not be broadcast together with shapes (29,) (20,) The question is, how can this difference be found? Above, I brought my failed attempts. Thank you in advance.