One of the parameters of the convolutional layer is the number of filters, which sets the depth of the output (next) layer on this convolutional layer. In general, the ะะปัะฑะธะฝะฐ ัะปะพั is not the number of color channels; rather, it can be viewed as a set of detected features (for example, vertical lines, horizontal, dianonal, at a certain angle X, Y, Z, arcs, circles, ellipses, etc.). The more convolutional layers and filters we have in them, the more difficult the signs of ANN are to learn to recognize (for example, the human eye or bird beak or the contour of a motorcycle or car). Each filter in your case has a dimension of 3x3x3 and judging by the dimension of 500x500x9 - there were 9 filters in this convolutional layer.
If 500ั
500ั
3 apply one filter with a 3ั
3ั
3 convolution 3ั
3ั
3 and padding='same' to a color image of dimension 500ั
500ั
3 then we will have a 2D matrix / tensor of dimension 500x500x1 . The last dimension number corresponds to the number of filters.