It is required to split a large file (1 600 000 lines) into lines and save to a collection with unique values. The file is loaded on javascript, the text received from the file is sent to the module in c ++, where the file is split into lines. The created collection is returned to js. The code that I managed to write:
C ++:
#include <node.h> #include <v8.h> #include <string> #include <sstream> #include <iterator> using namespace v8; // Split function -------------------------------------------------------------- template<typename Out> void split(const std::string &s, char delim, Out result) { std::stringstream ss; ss.str(s); std::string item; while (std::getline(ss, item, delim)) { *(result++) = item; } } std::vector<std::string> split(const std::string &s, char delim) { std::vector<std::string> elems; split(s, delim, std::back_inserter(elems)); return elems; } // Dictionary parse ------------------------------------------------------------ void Parse(const FunctionCallbackInfo<Value>& args) { Isolate* isolate = args.GetIsolate(); Local<Context> context = isolate->GetCurrentContext(); // Set для записи слов Local<Set> Words = Set::New(isolate); // Проверка аргумента на тип if (!args[0]->IsString()) { isolate->ThrowException( Exception::TypeError( String::NewFromUtf8(isolate, "Wrong type of first argument") ) ); return; } // Конвертим аргумент в строку String::Utf8Value arg(args[0]->ToString()); std::string dictInput = std::string(*arg); // Разбиение строки в вектор std::vector<std::string> v = split(dictInput, '\n'); // Перебор вектора std::vector<std::string>::iterator it = v.begin(); for (it; it != v.end(); ++it) { // Сохраняем слово в коллекцию Words->Add(context, String::NewFromUtf8(isolate, (*it).c_str())); } // Возвращаем получившийся Set args.GetReturnValue().Set(Words); } // Module exports -------------------------------------------------------------- void Init(Local<Object> exports, Local<Object> module) { NODE_SET_METHOD(module, "exports", Parse); } NODE_MODULE(addon, Init) Javascript:
const fs = require('fs'); const readDict = require('./build/Release/read-dictionary') let text = fs.readFileSync('./dictionary.txt', 'utf8') let WORDS = readDict(text); The problem is that the C ++ code works longer than 2200 ms, while the similar code on js runs about 1,550 ms, which is also very long (and takes only ten lines, yes):
const fs = require('fs'); let WORDS = new Set() let file = fs.readFileSync('file.txt') let list = text.split('\n') for (let i = 0; i < list.length; i++) { // Быстрее, чем "WORDS = new Set(list)" WORDS.add(list[i++]) } In order to speed up the processing, an attempt was made to write in C ++, but not very successful yet. ++ I do not know at all, today is the first day I write on it. Are there any ways to speed it all up?
thank
istringstreamon it, read the strings one by one withgetlineand transfer to this method itself. - Harry