Good day to all!

I use Symfony 3 and Doctrine in my project. The task is to process a large amount of data and for each element of this volume to update the information in the database. Accordingly, there is a foreach loop of this type:

foreach ($array as $item) { $entity = $em->getRepository('AppBundle:Model')->find($id); $entity->setParameter($value); $entity->flush(); } 

The fact is that after each iteration of the cycle, the amount of memory used grows and if the number of elements in the processed array is about tens of thousands, the memory usage rises even to gigabytes.

Read, including on the English-language StackOverflow, about the use of structures

  • $ em-> clear ();
  • $ this-> em-> getConnection () -> getConfiguration () -> setSQLLogger (null);
  • gc_collect_cycles ();

But none of this helps, and the memory still increases with each iteration. I hope for your help and thank you in advance!

1 answer 1

It may depend on many factors, not only on the Doctrine (for example, on the PHP version). It is also not entirely clear why you are doing flush() inside the foreach (i.e., based on the content of the question ~ 10,000 times). flush() method pulls the UnitOfWork::commit() method, which does a lot of interesting things inside itself, and some of these operations, for example: iterating over and saving snapshots, under certain circumstances (as mentioned above, up to the language version) can potentially be sources of memory leaks. In itself, the UnitOfWork pattern and its concrete implementation in Doctrine suggest (in most cases) not saving each entity separately, but a one-time call to EntityManager::flush() at the end of the transaction.

Here's a good article: http://www.doctrine-project.org/2009/08/07/doctrine2-batch-processing.html , which describes exactly your case: processing large amounts of data with Doctrine . The subsection Mass object processing is completely yours.

Plus, I can not help but notice that if the functionality is not much different from the one shown in the example (updating a couple of fields in more lines), then this is a typical example when it’s better to just use a direct query to the database without any ORM (I understand that There may be restrictions on the project that do not depend on you and which do not allow resorting to a “raw request”), or, if ORM , then at least through some query builder to make a request. If you need plus or minus the same functionality, then there is no objective reason to create 10,000 objects and run through them in a loop.

  • Thank you for the answer! Honestly, I did not know about the nuance regarding flush (). But on the article I stumbled upon searching, but I read, apparently inattentively. - kover-samolet