📜 ⬆️ ⬇️

Simplicity and complexity of primitives or how to determine unnecessary preprocessing for a neural network

This is the third article on the analysis and study of ellipses, triangles and other geometric shapes.
The previous articles have raised some very interesting questions for readers, in particular, about the complexity or simplicity of various training sequences. The questions are actually very interesting, for example, how much more difficult is a triangle for learning than a quadrilateral or another polygon?



Let us try to compare, and for comparison, we have an excellent, proven by generations of students, an idea - the shorter the cheat sheet, the easier the exam.

This article is also simply the result of curiosity and idle interest, nothing of it is found in practice, and for practical tasks there are a couple of great ideas, but there is almost nothing for copy-painting. This is a small study of the complexity of the training sequences - the author's reasoning and the code are set out, you can check / add / change everything yourself.

So, let's try to find out which geometrical figure is more difficult or simpler for segmentation, which course of lectures for AI is more comprehensible and better assimilated.

There are many different geometrical figures, but we will only compare triangles, quadrangles and five-pointed stars. We will use a simple method for constructing a train sequence - we will divide the 128x128 single-color image into four parts and randomly place an ellipse and, for example, a triangle in these quarters. We will detect a triangle of the same color as the ellipse. Those. the task is to train the network to distinguish, for example, a quadrangle polygon from an ellipse colored in the same color. Here are some examples of pictures that we will study.







We will not detect a triangle and a quadrangle in one picture, we will detect them separately, in different trains, against the background of an ellipse-like disturbance.

Take for study the classic U-net and three types of training sequences with triangles, quadrangles and stars.

So, given:


Idea for verification:


Let's start, choose 10,000 pairs of pictures of quadrangles with ellipses and masks and consider them carefully. We are interested in how short the crib will turn out and what its length depends on.

Load the library, determine the size of the array of images
import numpy as np import matplotlib.pyplot as plt %matplotlib inline import math from tqdm import tqdm from skimage.draw import ellipse, polygon from keras import Model from keras.optimizers import Adam from keras.layers import Input,Conv2D,Conv2DTranspose,MaxPooling2D,concatenate from keras.layers import BatchNormalization,Activation,Add,Dropout from keras.losses import binary_crossentropy from keras import backend as K import tensorflow as tf import keras as keras w_size = 128 train_num = 10000 radius_min = 10 radius_max = 20 


define loss and accuracy functions
 def dice_coef(y_true, y_pred): y_true_f = K.flatten(y_true) y_pred = K.cast(y_pred, 'float32') y_pred_f = K.cast(K.greater(K.flatten(y_pred), 0.5), 'float32') intersection = y_true_f * y_pred_f score = 2. * K.sum(intersection) / (K.sum(y_true_f) + K.sum(y_pred_f)) return score def dice_loss(y_true, y_pred): smooth = 1. y_true_f = K.flatten(y_true) y_pred_f = K.flatten(y_pred) intersection = y_true_f * y_pred_f score = (2. * K.sum(intersection) + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth) return 1. - score def bce_dice_loss(y_true, y_pred): return binary_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred) def get_iou_vector(A, B): # Numpy version batch_size = A.shape[0] metric = 0.0 for batch in range(batch_size): t, p = A[batch], B[batch] true = np.sum(t) pred = np.sum(p) # deal with empty mask first if true == 0: metric += (pred == 0) continue # non empty mask case. Union is never empty # hence it is safe to divide by its number of pixels intersection = np.sum(t * p) union = true + pred - intersection iou = intersection / union # iou metrric is a stepwise approximation of the real iou over 0.5 iou = np.floor(max(0, (iou - 0.45)*20)) / 10 metric += iou # teake the average over all images in batch metric /= batch_size return metric def my_iou_metric(label, pred): # Tensorflow version return tf.py_func(get_iou_vector, [label, pred > 0.5], tf.float64) from keras.utils.generic_utils import get_custom_objects get_custom_objects().update({'bce_dice_loss': bce_dice_loss }) get_custom_objects().update({'dice_loss': dice_loss }) get_custom_objects().update({'dice_coef': dice_coef }) get_custom_objects().update({'my_iou_metric': my_iou_metric }) 


We will use the metric from the first article . Let me remind readers that we will predict the pixel mask - this is the “background” or “quadrilateral” and evaluate the truth or falsity of the prediction. Those. The following four options are possible - we correctly predicted that a pixel is a background, correctly predicted that a pixel is a quadrilateral, or made a mistake in predicting a “background” or “quadrilateral”. And so on all the pictures and all the pixels we estimate the number of all four options and calculate the result - this will be the result of the network. And the fewer erroneous predictions and more true, the more accurate the result and the better the network operation.

We examine the network as a “black box”, we will not look at what is happening with the network inside, how weights change and how gradients are selected - look into the depths of the network later when we compare the networks.

simple u-net
 def build_model(input_layer, start_neurons): # 128 -> 64 conv1 = Conv2D(start_neurons * 1, (3, 3), activation="relu", padding="same")(input_layer) conv1 = Conv2D(start_neurons * 1, (3, 3), activation="relu", padding="same")(conv1) pool1 = MaxPooling2D((2, 2))(conv1) pool1 = Dropout(0.25)(pool1) # 64 -> 32 conv2 = Conv2D(start_neurons * 2, (3, 3), activation="relu", padding="same")(pool1) conv2 = Conv2D(start_neurons * 2, (3, 3), activation="relu", padding="same")(conv2) pool2 = MaxPooling2D((2, 2))(conv2) pool2 = Dropout(0.5)(pool2) # 32 -> 16 conv3 = Conv2D(start_neurons * 4, (3, 3), activation="relu", padding="same")(pool2) conv3 = Conv2D(start_neurons * 4, (3, 3), activation="relu", padding="same")(conv3) pool3 = MaxPooling2D((2, 2))(conv3) pool3 = Dropout(0.5)(pool3) # 16 -> 8 conv4 = Conv2D(start_neurons * 8, (3, 3), activation="relu", padding="same")(pool3) conv4 = Conv2D(start_neurons * 8, (3, 3), activation="relu", padding="same")(conv4) pool4 = MaxPooling2D((2, 2))(conv4) pool4 = Dropout(0.5)(pool4) # Middle convm = Conv2D(start_neurons * 16, (3, 3), activation="relu", padding="same")(pool4) convm = Conv2D(start_neurons * 16, (3, 3), activation="relu", padding="same")(convm) # 8 -> 16 deconv4 = Conv2DTranspose(start_neurons * 8, (3, 3), strides=(2, 2), padding="same")(convm) uconv4 = concatenate([deconv4, conv4]) uconv4 = Dropout(0.5)(uconv4) uconv4 = Conv2D(start_neurons * 8, (3, 3), activation="relu", padding="same")(uconv4) uconv4 = Conv2D(start_neurons * 8, (3, 3), activation="relu", padding="same")(uconv4) # 16 -> 32 deconv3 = Conv2DTranspose(start_neurons * 4, (3, 3), strides=(2, 2), padding="same")(uconv4) uconv3 = concatenate([deconv3, conv3]) uconv3 = Dropout(0.5)(uconv3) uconv3 = Conv2D(start_neurons * 4, (3, 3), activation="relu", padding="same")(uconv3) uconv3 = Conv2D(start_neurons * 4, (3, 3), activation="relu", padding="same")(uconv3) # 32 -> 64 deconv2 = Conv2DTranspose(start_neurons * 2, (3, 3), strides=(2, 2), padding="same")(uconv3) uconv2 = concatenate([deconv2, conv2]) uconv2 = Dropout(0.5)(uconv2) uconv2 = Conv2D(start_neurons * 2, (3, 3), activation="relu", padding="same")(uconv2) uconv2 = Conv2D(start_neurons * 2, (3, 3), activation="relu", padding="same")(uconv2) # 64 -> 128 deconv1 = Conv2DTranspose(start_neurons * 1, (3, 3), strides=(2, 2), padding="same")(uconv2) uconv1 = concatenate([deconv1, conv1]) uconv1 = Dropout(0.5)(uconv1) uconv1 = Conv2D(start_neurons * 1, (3, 3), activation="relu", padding="same")(uconv1) uconv1 = Conv2D(start_neurons * 1, (3, 3), activation="relu", padding="same")(uconv1) uncov1 = Dropout(0.5)(uconv1) output_layer = Conv2D(1, (1,1), padding="same", activation="sigmoid")(uconv1) return output_layer # model input_layer = Input((w_size, w_size, 1)) output_layer = build_model(input_layer, 26) model = Model(input_layer, output_layer) model.compile(loss=bce_dice_loss, optimizer=Adam(lr=1e-4), metrics=[my_iou_metric]) model.summary() 


The function of generating pairs of image / mask. On a black and white picture 128x128 filled with random noise with randomly selected from two ranges, or 0.0 ... 0.75 or 0.25.1.0. Randomly select a quarter in the picture and place a randomly oriented ellipse and in the other quarter place a quad and equally color it with random noise.

 def next_pair(): img_l = (np.random.sample((w_size, w_size, 1))* 0.75).astype('float32') img_h = (np.random.sample((w_size, w_size, 1))* 0.75 + 0.25).astype('float32') img = np.zeros((w_size, w_size, 2), dtype='float') i0_qua = math.trunc(np.random.sample()*4.) i1_qua = math.trunc(np.random.sample()*4.) while i0_qua == i1_qua: i1_qua = math.trunc(np.random.sample()*4.) _qua = np.int(w_size/4) qua = np.array([[_qua,_qua],[_qua,_qua*3],[_qua*3,_qua*3],[_qua*3,_qua]]) p = np.random.sample() - 0.5 r = qua[i0_qua,0] c = qua[i0_qua,1] r_radius = np.random.sample()*(radius_max-radius_min) + radius_min c_radius = np.random.sample()*(radius_max-radius_min) + radius_min rot = np.random.sample()*360 rr, cc = ellipse( r, c, r_radius, c_radius, rotation=np.deg2rad(rot), shape=img_l.shape ) p0 = np.rint(np.random.sample()*(radius_max-radius_min) + radius_min) p1 = qua[i1_qua,0] - (radius_max-radius_min) p2 = qua[i1_qua,1] - (radius_max-radius_min) p3 = np.rint(np.random.sample()*radius_min) p4 = np.rint(np.random.sample()*radius_min) p5 = np.rint(np.random.sample()*radius_min) p6 = np.rint(np.random.sample()*radius_min) p7 = np.rint(np.random.sample()*radius_min) p8 = np.rint(np.random.sample()*radius_min) poly = np.array(( (p1, p2), (p1+p3, p2+p4+p0), (p1+p5+p0, p2+p6+p0), (p1+p7+p0, p2+p8), (p1, p2), )) rr_p, cc_p = polygon(poly[:, 0], poly[:, 1], img_l.shape) if p > 0: img[:,:,:1] = img_l.copy() img[rr, cc,:1] = img_h[rr, cc] img[rr_p, cc_p,:1] = img_h[rr_p, cc_p] else: img[:,:,:1] = img_h.copy() img[rr, cc,:1] = img_l[rr, cc] img[rr_p, cc_p,:1] = img_l[rr_p, cc_p] img[:,:,1] = 0. img[rr_p, cc_p,1] = 1. return img 

Let's create a training sequence of pairs, let's see random 10. Let me remind you that the pictures are monochrome, grayscale.

 _txy = [next_pair() for idx in range(train_num)] f_imgs = np.array(_txy)[:,:,:,:1].reshape(-1,w_size ,w_size ,1) f_msks = np.array(_txy)[:,:,:,1:].reshape(-1,w_size ,w_size ,1) del(_txy) # смотрим на случайные 10 с масками fig, axes = plt.subplots(2, 10, figsize=(20, 5)) for k in range(10): kk = np.random.randint(train_num) axes[0,k].set_axis_off() axes[0,k].imshow(f_imgs[kk]) axes[1,k].set_axis_off() axes[1,k].imshow(f_msks[kk].squeeze()) 



First step. We train on the minimum starting set


The first step of our experiment is simple, we are trying to train the network to predict only 11 first pictures.

 batch_size = 10 val_len = 11 precision = 0.85 m0_select = np.zeros((f_imgs.shape[0]), dtype='int') for k in range(val_len): m0_select[k] = 1 t = tqdm() while True: fit = model.fit(f_imgs[m0_select>0], f_msks[m0_select>0], batch_size=batch_size, epochs=1, verbose=0 ) current_accu = fit.history['my_iou_metric'][0] current_loss = fit.history['loss'][0] t.set_description("accuracy {0:6.4f} loss {1:6.4f} ".\ format(current_accu, current_loss)) t.update(1) if current_accu > precision: break t.close() 

accuracy 0.8545 loss 0.0674 lenght 11 : : 793it [00:58, 14.79it/s]

We selected the first 11 from the initial sequence and trained the network on them. Now it doesn’t matter whether the network memorizes these particular pictures or generalizes, the main thing is that it can recognize these 11 pictures as we need it. Depending on the chosen dataset and accuracy, network training can last for a long, very long time. But we have only a few iterations. I repeat that now it does not matter to us how or what the network has learned or learned, the main thing is that it has achieved the established prediction accuracy.

Now let's start the main experiment.


We will build a cheat sheet, we will build such cheat sheets separately for all three training sequences and compare their length. We will take new picture / mask pairs from the constructed sequence and will try to predict them with a network trained on the already selected sequence. At the beginning it is only 11 pairs of picture / mask and the network is trained, perhaps not very well. If a new mask is predicted for a picture with acceptable accuracy, then we throw out this pair, it does not contain new information for the network, it already knows and can calculate a mask from this picture. If the prediction accuracy is not sufficient, then we add this masked image to our sequence and begin to train the network until an acceptable accuracy is achieved on the selected sequence. Those. This picture contains new information and we add it to our training sequence and extract the information contained in it by training.

 batch_size = 50 t_batch_size = 1024 raw_len = val_len t = tqdm(-1) id_train = 0 #id_select = 1 while True: t.set_description("Accuracy {0:6.4f} loss {1:6.4f}\ selected img {2:5d} tested img {3:5d} ". format(current_accu, current_loss, val_len, raw_len)) t.update(1) if id_train == 1: fit = model.fit(f_imgs[m0_select>0], f_msks[m0_select>0], batch_size=batch_size, epochs=1, verbose=0 ) current_accu = fit.history['my_iou_metric'][0] current_loss = fit.history['loss'][0] if current_accu > precision: id_train = 0 else: t_pred = model.predict( f_imgs[raw_len: min(raw_len+t_batch_size,f_imgs.shape[0])], batch_size=batch_size ) for kk in range(t_pred.shape[0]): val_iou = get_iou_vector( f_msks[raw_len+kk].reshape(1,w_size,w_size,1), t_pred[kk].reshape(1,w_size,w_size,1) > 0.5) if val_iou < precision*0.95: new_img_test = 1 m0_select[raw_len+kk] = 1 val_len += 1 break raw_len += (kk+1) id_train = 1 if raw_len >= train_num: break t.close() 

 Accuracy 0.9338 loss 0.0266 selected img 1007 tested img 9985 : : 4291it [49:52, 1.73s/it] 

Here accuracy is used in the sense of “accuracy”, and not as the standard metric keras, and the subroutine “my_iou_metric” is used to calculate the accuracy.

Now let's compare the work of the same network with the same parameters on a different sequence, on triangles



And we get a completely different result.

 Accuracy 0.9823 loss 0.0108 selected img 1913 tested img 9995 : : 6343it [2:11:36, 3.03s/it] 

The network chose 1913 pictures with “new” information, i.e. the pithiness of the pictures with triangles is two times lower than with quadrilaterals!

Check the same on the stars and run the network on the third sequence.



will get

 Accuracy 0.8985 loss 0.0478 selected img 476 tested img 9985 : : 2188it [16:13, 1.16it/s] 

As you can see, the stars were the most informative, only 476 pictures in the cheat sheet.

We have reason to judge the complexity of geometric shapes for perception of their neural network. The simplest is a star, only 476 pictures in the cheat sheet, then a quad with its 1007 and the most difficult is a triangle - you need 1913 pictures to learn.

Consider, this is for us, for people these are pictures, and for the network this is a course of recognition lectures and a course about triangles turned out to be the most difficult.

Now about serious


At first glance, all these ellipses and triangles seem to be pampering, sand cakes and lego. But here is a specific and serious question: if we apply some kind of preprocessing, a filter to the initial sequence, how will the complexity of the sequence change? For example, take all the same ellipses and quadrangles and apply such preprocessing to them

 from scipy.ndimage import gaussian_filter _tmp = [gaussian_filter(idx, sigma = 1) for idx in f_imgs] f1_imgs = np.array(_tmp)[:,:,:,:1].reshape(-1,w_size ,w_size ,1) del(_tmp) fig, axes = plt.subplots(2, 5, figsize=(20, 7)) for k in range(5): kk = np.random.randint(train_num) axes[0,k].set_axis_off() axes[0,k].imshow(f1_imgs[kk].squeeze(), cmap="gray") axes[1,k].set_axis_off() axes[1,k].imshow(f_msks[kk].squeeze(), cmap="gray") 



At first glance, everything is the same, the same ellipses, the same polygons, but the network began to work quite differently:

 Accuracy 1.0575 loss 0.0011 selected img 7963 tested img 9999 : : 17765it [29:02:00, 12.40s/it] 

It requires a little explanation, we do not use augmentation, because The shape of the polygon and the shape of the ellipse are initially randomly selected. Therefore, augmentation will not give new information and does not make sense with this case.

But, as can be seen from the result of the work, simple gaussian_filter created many problems for the network, generated a lot of new, and probably unnecessary, information.

Well, for lovers of simplicity in its pure form, let's take the same ellipses with polygons, but without any accident in color



the result suggests that random color is not at all a simple additive.

 Accuracy 0.9004 loss 0.0315 selected img 251 tested img 9832 : : 1000it [06:46, 1.33it/s] 

The network has completely managed the information extracted from 251 pictures, almost four times less than from the multitude of pictures painted with noise.

The purpose of the article is to show some tool and examples of its work in non-serious examples, Lego in the sandbox. We have obtained a tool for comparing two training sequences, we can estimate how much our preprocessing complicates or simplifies the training sequence, how simple this or that primitive in the training sequence is for detection.

The possibility of applying this example of lego in real cases is obvious, but real trainings and networks of readers are the business of the readers themselves.

Source: https://habr.com/ru/post/439122/