Why is regular in Pearl 40% faster than in Go? Part of a simple log analysis script.

Perl:

$l=~s/\A([^\s]+?) - - \[([^\]]+?)\] \"([^\"]+?)\" ([^\s]+?) ([^\s]+?) \"([^\"]+?)\"(.+)/$1\n$2\n$3\n$4\n$5\n$6\n$7/g; ($ip, $time, $page, $code, $size, $ref, $agent, $els) = split(/\n/, $l); $page=~s/(GET|HEAD|POST) (.+) (HTTP.+)/$2/; $hash{$page}++; # this faster than Golang (35-40%) 

Go:

  log_format := `^([^ ]+) (-) (-) \[([^\]]+)\] "([^\"]+?)" ([0-9]+) ([^ ]+) "([^"])*" "([^"]*)"` logParser := regexp.MustCompilePOSIX(log_format) log_format_get := `^(GET|HEAD|POST) (.+) (HTTP.+)$` logParserGet := regexp.MustCompilePOSIX(log_format_get) var hash = make(map[string] int); analize1 := func (iline *string) { submatch := logParser.FindSubmatch(strings.TrimSpace(*iline)) if (len(submatch[0])>0){ pg := logParserGet.FindAllStringSubmatch(strings.TrimSpace(submatch[0][5]), 1) if (len(pg)>0){ hash[pg[0][2]]++ } } 
  • one
    You have a FindAllStringSubmatch script command inserted into the FindSubmatch and FindAllStringSubmatch "loop" method on Go, which can cause a significant slowdown compared to split , which does not generate a loop with a nested script command. Try not to do script cycles if you want to gain time. It can somehow be divided without attachments by correcting the regex expression or somehow differently that the loop would process the core, not the script - then it will be faster. - nick_n_a
  • 1) First, build the correct profiling tests that exclude the influence of everything else, then compare them 2) Does the name Perl itself speak about something? - PinkTux
  • TrimSpace does not affect speed. - Dim900

2 answers 2

In general, they write everywhere that the optimization of regular expressions in the Go language is worse compared to Perl, because Perl is older and more people / hours were invested in its development.

Here there is a speed test in different languages, regexp test

  • Some kind of wrong chart. Well, impersonal. 1) There is no php in it, it would take the leading places, after the pearl 2) the pearl processes regular expressions faster than v8 / js, checked 3) the pearl is not so different from C, because the pcre module itself is represented by the native code. But in general, yes. It all depends on the quality of optimization of the engine itself in a particular language. - ReinRaus
  • @PeterSmith you're right) - Paulo Berezini
  • one
    Found the original in the picture -_- citforum.ru/news/26547 some kind of black box was tested :) - ReinRaus
  • @ReinRaus I read in English) - Paulo Berezini

Regulars are not a strong place in Go, from simple languages ​​only PHP shows a good result, the Benchmarks Game is well suited to form an overall picture of the “speed” of languages.