Greetings. Python 2.7, Ubuntu16.04, hdd. As part of solving the problem, I needed to bypass about 10 folders, each of which contains approximately 4500 small png format pictures. I process each of the pictures with the following Python code:

ndimage.imread(image_file).astype(float) 

Depending on certain circumstances, the average running time of this code for one picture differs by three decimal orders. Here is what I managed to notice:

1) If some folder was processed up to a certain point, after which the program was interrupted, after launching the program again this folder is processed at a good speed to the same place, then the speed drops again.

2) After a reboot, the files and folders showing a good speed change in an unpredictable way.

Just in case I enclose the entire small code snippet:

 def load_letter(folder, min_num_images): image_files = os.listdir(folder) dataset = np.ndarray(shape=(len(image_files), image_size, image_size), dtype=np.float32) print(folder) num_images = 0 for image in image_files: image_file = os.path.join(folder, image) try: image_data = (ndimage.imread(image_file).astype(float) - pixel_depth / 2) / pixel_depth if image_data.shape != (image_size, image_size): raise Exception('Unexpected image shape: %s' % str(image_data.shape)) dataset[num_images, :, :] = image_data num_images = num_images + 1 if(num_images % 400 == 0): print(str(num_images)+'\n') except IOError as e: print('Could not read:', image_file, ':', e, '- it\'s ok, skipping.') dataset = dataset[0:num_images, :, :] if num_images < min_num_images: raise Exception('Many fewer images than expected: %d < %d' % (num_images, min_num_images)) print('Full dataset tensor:', dataset.shape) print('Mean:', np.mean(dataset)) print('Standard deviation:', np.std(dataset)) return dataset def maybe_pickle(data_folders, min_num_images_per_class, force=True): dataset_names = [] data_folders.reverse() for folder in data_folders: print('\n\n\n\n\n\n'+folder+'\n\n\n\n\n\n') set_filename = folder + '.pickle' dataset_names.append(set_filename) if os.path.exists(set_filename) and not force: print('%s already present - Skipping pickling.' % set_filename) else: print('Pickling %s.' % set_filename) dataset = load_letter(folder, min_num_images_per_class) try: with open(set_filename, 'wb') as f: pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL) except Exception as e: print('Unable to save data to', set_filename, ':', e) return dataset_names train_datasets = maybe_pickle(large_data, 45000) test_datasets = maybe_pickle(small_data, 1800) 
  • 2
    Maybe it's in the disk cache? - Surfin Bird
  • Came to the same conclusions, thanks. - Vladislav Anisimov

0