For example, there is an image of 500ั…500ั…3 (or x4, if RGBA ) and a core for a convolution of 3ั…3ั…3 .

Why, after convolution, the output image already has a larger dimension of channels ( 500ั…500ั…9 )? How does this happen and what values โ€‹โ€‹are written there on the output?

  • Good afternoon, I would like to advise you on the Data Science resource, there are a lot of answers on this topic, and on this site the question may not be understood by the community or closed. - True-hacker
  • How many filters are you using? If you want to get an adequate answer , please indicate the corresponding code in the question - MaxU

1 answer 1

One of the parameters of the convolutional layer is the number of filters, which sets the depth of the output (next) layer on this convolutional layer. In general, the ะ“ะปัƒะฑะธะฝะฐ ัะปะพั is not the number of color channels; rather, it can be viewed as a set of detected features (for example, vertical lines, horizontal, dianonal, at a certain angle X, Y, Z, arcs, circles, ellipses, etc.). The more convolutional layers and filters we have in them, the more difficult the signs of ANN are to learn to recognize (for example, the human eye or bird beak or the contour of a motorcycle or car). Each filter in your case has a dimension of 3x3x3 and judging by the dimension of 500x500x9 - there were 9 filters in this convolutional layer.

If 500ั…500ั…3 apply one filter with a 3ั…3ั…3 convolution 3ั…3ั…3 and padding='same' to a color image of dimension 500ั…500ั…3 then we will have a 2D matrix / tensor of dimension 500x500x1 . The last dimension number corresponds to the number of filters.