Teaching convolutional networks with Keras, backend - TensorFlow. When training several networks (in a loop), after several (approximately 6-10) trained networks, the process is completed by issuing the following messages:

2018-08-01 03:12:29.507320: WT:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_dnn.cc:2440] 2018-08-01 03:12:32.384245: WT:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_dnn.cc:2440] 2018-08-01 03:12:43.149686: WT:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 87.89MiB. Current allocation summary follows. 2018-08-01 03:12:43.150195: IT:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (256): Total Chunks: 239, Chunks in use: 238. 59.8KiB allocated for chunks. 59.5KiB in use in bin. 1.9KiB client-requested in use in bin. 2018-08-01 03:12:43.150888: IT:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (512): Total Chunks: 64, Chunks in use: 63. 32.0KiB allocated for chunks. 31.5KiB in use in bin. 31.5KiB client-requested in use in bin. 2018-08-01 03:12:43.151555: IT:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (1024): Total Chunks: 5, Chunks in use: 5. 5.3KiB allocated for chunks. 5.3KiB in use in bin. 5.0KiB client-requested in use in bin. 2018-08-01 03:12:43.152226: IT:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (2048): Total Chunks: 8, Chunks in use: 7. 17.3KiB allocated for chunks. 14.0KiB in use in bin. 14.0KiB client-requested in use in bin. 2018-08-01 03:12:43.152878: IT:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (4096): Total Chunks: 1, Chunks in use: 1. 6.8KiB allocated for chunks. 6.8KiB in use in bin. 6.8KiB client-requested in use in bin. 2018-08-01 03:12:43.153526: IT:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. и т.п. сообщения 2018-08-01 03:12:43.165310: IT:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:646] Bin for 87.89MiB was 64.00MiB, Chunk State: 2018-08-01 03:12:43.165660: IT:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:652] Size: 87.89MiB | Requested Size: 512.0KiB | in_use: 0, prev: Size: 512B | Requested Size: 512B | in_use: 1, next: Size: 256B | Requested Size: 4B | in_use: 1 2018-08-01 03:12:43.166916: IT:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:665] Chunk at 0000000500D60000 of size 1280 2018-08-01 03:12:43.167233: IT:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:665] Chunk at 0000000500D60500 of size 256 2018-08-01 03:12:43.167544: IT:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:665] Chunk at 0000000500D60600 of size 256 далее много подобных сообщений 2018-08-01 03:12:43.345725: IT:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:680] Stats: Limit: 591550873 InUse: 460528896 MaxInUse: 591547392 NumAllocs: 153014 MaxAllocSize: 368753664 2018-08-01 03:12:43.346681: WT:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:279] ********************************______________******************_____****************xxxxxxxxxxxxxxx 2018-08-01 03:12:43.485313: WT:\src\github\tensorflow\tensorflow\core\framework\op_kernel.cc:1318] OP_REQUIRES failed at conv_ops.cc:693 : Resource exhausted: OOM when allocating tensor with shape[16,64,150,150] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc Traceback (most recent call last): File "C:/Users/[path]/multiply_learning.py", line 118, in <module> callbacks=callbacks) File "C:\Users\[path]\venv\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "C:\Users\[path]\venv\lib\site-packages\keras\engine\training.py", line 1415, in fit_generator initial_epoch=initial_epoch) File "C:\Users\[path]\venv\lib\site-packages\keras\engine\training_generator.py", line 213, in fit_generator class_weight=class_weight) File "C:\Users\[path]\venv\lib\site-packages\keras\engine\training.py", line 1215, in train_on_batch outputs = self.train_function(ins) File "C:\Users\[path]\venv\lib\site-packages\keras\backend\tensorflow_backend.py", line 2666, in __call__ return self._call(inputs) File "C:\Users\[path]\venv\lib\site-packages\keras\backend\tensorflow_backend.py", line 2636, in _call fetched = self._callable_fn(*array_vals) File "C:\Users\[path]\venv\lib\site-packages\tensorflow\python\client\session.py", line 1454, in __call__ self._session._session, self._handle, args, status, None) File "C:\Users\[path]\venv\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 519, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,64,150,150] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: vgg16_7/block1_conv2/convolution = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](vgg16_7/block1_conv1/Relu, block1_conv2/kernel/read)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[Node: dense_16/BiasAdd/_1005 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_236_dense_16/BiasAdd", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. Process finished with exit code 1 

When teaching one network, everything happens normally with any number of epochs. What is the reason for such departures, and what are the ways to eliminate them?

    0