I use Parallel::ForkManager for perl multi-threading. The script itself is basically simple, the file is opened and data is read into the array from there, after the while , the array is opened and, using the ForkManager data is passed to the function for processing. But the problem is that if the input has to have 15 function calls, then the output can be 15 + n (sometimes it is called 1 or 10 times more), i.e., in the cycle one of the calls is duplicated .

Is it possible to check this somehow so that the data are not duplicated? At the output, the data is slightly different from the same functions (there is considered time and data do not match for 1 second) because of which this double cannot be thrown out of the array.

The script itself:

 use Parallel::ForkManager; use Text::ParseWords; use IPC::Shareable; my @wPrint = 0; my $handle = tie @wPrint, 'IPC::Shareable', { destroy => 'Yes' }; my $fileLog = 'test.txt'; $pm = new Parallel::ForkManager(5); sub myFunc { my ($s) = @_; my @arr = quotewords(":", 0, $s); $handle->shlock(); push(@wPrint, $arr[0].":".$arr[1].":".$arr[2].":".$arr[3].":".$arr[4].":\n"); $handle->shunlock(); } open(my $file, '<:encoding(UTF-8)', $fileLog); while (my $row = <$file>) { my $pid = $pm->start and next; myFunc($row); $pm->finish; } close $file; $pm->wait_all_children; print @wPrint; IPC::Shareable->clean_up_all; 

The data in the file in this form:

 10.0.0.1:Имя 1:0:0:0: 10.0.0.2:Имя 2:0:0:0: 10.0.0.3:Имя 3:0:0:0: 
  • Please note that in the example that I gave you with IPC :: Shareable, when a tie, a certain $ handle is obtained and before each use of the variable in the processes, a lock is taken for this resource and after working with the variable it is released. Without locks, the contents of the variable are unpredictable. Although it may be the case here, this is not the case, but blocking shared memory in a multi-threaded environment is strictly necessary - Mike
  • @Mike I tried to output the result of the function "myFunc" through normal print (i.e., not what variable was used to store data) and the result is the same. That is, there are duplicates initially. I will add about the buffer now. - Firsim
  • If you run the function without ForkManager, then there are no duplicates, i.e. everything is passed line by line as recorded in the file without double calls. - Firsim

2 answers 2

I can’t understand why this is exactly what happens, but when fork works, the main read loop of the $file sometimes repeats reading the same data. I suspect that this is due to the peculiarities of buffering and file descriptors in perl. The second process most likely receives a copy of these descriptors and something happens to them. It is treated by pre-reading the entire input file into the array and the work of the cycle is already on the data from the array:

 open(my $file, '<:encoding(UTF-8)', $fileLog); my @rows=do{local $/; split(/\n/,<$file>)}; close $file; foreach my $row (@rows) { my $pid = $pm->start and next; myFunc($row); $pm->finish; } 
  • thank! Indeed it has cured the problem. - Firsim

I will add. First, it is easier to read the file into an array :)

 my @rows = <$file>; 

Secondly, it is inhuman to suck the entire file into memory. Where it is more correct to read it through STDIN , at the same time it will solve the problem with file pointer downloads:

 while( my $row = <STDIN> ) { chomp $row; next unless $row; $pm->start and next; 

And most importantly. The approach is wrong in principle. Because for small files, parallel processing does not make any sense at all. And for large ones, a fork for each row is unlikely to increase processing efficiency. I would immediately nag some number of descendants in advance, and passed the data through some queue manager from the parent process.