The following code simulates loading and parsing HTML. Pulls links and stores them in the global array.

If it saves the links as shown in the line "Option 1", then the node begins to eat memory. ~ 1GB for 30.000 links If you comment out this line and save it with the "Option 2" method, then everything is fine, the memory grows, but little by little, as the array of links increases, you can save a million of them.

I can not understand the difference, because in both cases a primitive string is stored in the array? Checked in all sorts of ways, calling $ (). Text () does not return an object in which he could add a link to the entire DOM and the source text, in addition, but a string. But the impression is that all this somehow clings and the garbage collector cannot clean the memory.

var cheerio = require('cheerio'); var _=require('lodash'); var links=[]; for (var i=0;i<100000;i++) { getS(); console.log('%s links collected',links.length); } function getS() { var body='<body>'; for(var i=0;i<20;i++) { body+='<a class="i-ljuser-username" href="#">'+Math.random()*100000+'</a>'; while (body.length<15000*i) { body+='<span>+Math.random()*100000+</span>'; } } body+='</body>'; var $=cheerio.load(body); var list=$('ai-ljuser-username'); _.forEach(list,function(item){ links.push($(item).text()); //Вариант 1 //links.push(($(item).text()+' ').trim()); //Вариант 2 }); } 
  • I think that by doing $(item).text()+' ' you explicitly cast to the string that "forgets about where it is from." And in the first case, the virtual machine tries to save on copying and adds not just a string, but an object that, although it looks like a string, remembers “too much”. - KoVadim
  • I'm more interested in exactly when the virtual machine starts to save. Because shamanizing in such a way over each line is a sort of perversion. Well, I came across a situation where it manifests itself clearly and quickly, and if not? - Alexandr Kozlov
  • If you want to see exactly what memory is eating, you can remove the hip dump and look there. .text() can actually return something different from the regular string. - etki
  • one
    I'm more interested in exactly when the virtual machine starts to save. That's a very difficult question. and my answer is short - you want more predictability - use more predictable languages. Or delve into the study of the node. - KoVadim
  • Judging by the cheerio source code, lines are also used there, so this is some kind of mystic node. There is an assumption that the garbage collector simply does not work, because it has not yet rested against the limit, and the second example somehow triggers its execution. How exactly were the metrics about memory usage taken and did it ever reach out of memory-errors? - etki

0