There is a piece of code that loads lines from a file into a collection with unique values.

var WORDS = new Set() let file = fs.readFileSync('file.txt') // 1 500 000+ строк // Прошло ~10 мс let text = iconv.decode(file, 'windows-1251') // Прошло ~100 мс let list = text.split('\n') // Прошло ~500 мс let i = 0 while (list[i] != null) { // Быстрее, чем "WORDS = new Set(list)" WORDS.add(list[i++]) } // Прошло ~1100 мс 

The thickest parts of the code are splitting into cells and iteration. Is it possible to optimize this?

UPD:
Everything is done in order to quickly look for the value in the collection

 WORDS.has('string') // true или false 

So, if there are other ways to store and search for unique values, I’m in favor

    2 answers 2

    Well, if no joke, then you have two calls to the same element of the array. So it will be exactly faster -

     let length = list.length; for(var i = 0; i < length; i++){ Worlds.add(list[i]); } 

    And about just checking for existence, it is difficult to say without measurements. Set works with all types, which should theoretically make its work slower than a regular object, which is the base type for everything in js and uses a string as a key. I'm about -

     let hash = {}; hash[list[i]] = true; console.log(hash['string']); 

    adding to Set -

     const CharFactory = { count: 0, getChar(){ return 'some text' + this.count++; }, reset(){ this.count = 0; } }; const ITERATION = 1000000; const set = new Set(); console.time('add in Set'); for(let i = 0; i < ITERATION; i++){ set.add(CharFactory.getChar()); } console.timeEnd('add in Set'); 
    adding to Object -
     const CharFactory = { count: 0, getChar(){ return 'some text' + this.count++; }, reset(){ this.count = 0; } }; const ITERATION = 1000000; const hash = {}; console.time('add in Object'); for(let i = 0; i < ITERATION; i++){ hash[CharFactory.getChar()] = true; } console.timeEnd('add in Object'); 
    check in Set -
     const CharFactory = { count: 0, getChar(){ return 'some text' + this.count++; }, reset(){ this.count = 0; } }; const ITERATION = 1000000; const set = new Set(); for(let i = 0; i < ITERATION; i++){ set.add(CharFactory.getChar()); } CharFactory.reset(); console.time('has in Set'); for(let i = 0; i < ITERATION; i++){ let isCharExistValid = set.has(CharFactory.getChar()); } console.timeEnd('has in Set'); 
    check in Object -
     const CharFactory = { count: 0, getChar(){ return 'some text' + this.count++; }, reset(){ this.count = 0; } }; const ITERATION = 1000000; const hash = {}; for(let i = 0; i < ITERATION; i++){ hash[CharFactory.getChar()] = true; } CharFactory.reset(); console.time('has in Object'); for(let i = 0; i < ITERATION; i++){ let isCharExistValid = hash[CharFactory.getChar()]; } console.timeEnd('has in Object'); 

    Although it is necessary to test it in the node in which you work, I still could not resist and wrote tests for the browser.

    • Indeed, I did not think. The code has really accelerated, but unfortunately only for 50 ms. Plus, I always thought that length is a getter and when I call it, there will be an extra count of the number of cells that will take time. In any case, on the front end it was like this - Vasya Shmarovoz
    • As for the objects, at first the code was written this way, but paradoxically, Set works one and a half times faster, so I switched to it - Vasya Shmarovoz
    • As for the length, you are right and not only that it is a getter, which should be recalculated when the array is changed, and not during the conversion, but I’m already used to it. - user220409
    • I tested your examples in the node - indeed, the objects work faster. But with my code, for some reason, other circumstances rextester.com/GKVS16394 - Vasya Shmarovoz
    • Well, plus you have no assignments in the example on the creation of an object - the keys are not created but just empty cells are being searched , respectively, it is faster - Vasya Shmarovoz

    For performance, it is worth doing only one pass per line.

     let WORDS = {}; let file = fs.readFileSync('file.txt') // 1 500 000+ строк let text = iconv.decode(file, 'windows-1251'); let len = text.length; let offset = 0; for (let i = 0; i < len; ++i) { if ((text[i] == '\n' || text[i] == '\r') && offset != i) { WORDS[text.substring(offset, i)] = true; offset = i + 1; } } if (offset != len - 1) { WORDS[text.substring(offset, len)] = true; } 

    • I also thought about it and tried it, but in the end it was twice as slow. Apparently native split () works faster than checking for "\ n". And you can also join the discussion about the speed of objects and Set-collections in the comments of the previous answer ( rextester.com/GKVS16394 ) - Vasya Shmarovoz