Comparing Recursion and Cycle Performance

Question

Good day. There is such code:

public class PerfTest { public static void main(String[] args) { long start = System.nanoTime(); for (int i = 0; i < 1000000; i++) { func1(1000); } System.out.println(System.nanoTime()-start); System.out.println(); start = System.nanoTime(); for (int i = 0; i < 1000000; i++) { func2(1000); } System.out.println(System.nanoTime()-start); } public static int func1(int value){ if(--value > 0) func1(value); return value; } public static int func2(int value){ while(--value > 0); return value; } }

at the exit we have:

 1650043101 1937118

What makes such a difference? As far as I understand, during cyclic processing, the iterated variable is in the registers of the processor, and during recursion, a copy of the value of the variable that is taken from memory is transferred to each recursive method call. Is it so? Thanks for answers).

a strange test, the optimizer had to see that the values from the cycles and inside the function do not change and the result is always the same: return the very first transmitted value reduced by one, and throw out the recursion and cycle, leaving the function call of the type in which return, or generally zainlaynit it, to see that the value of the function is not used anywhere, throw it away, see two empty cycles and throw them out
@Grundy java does not know how to optimize recursion at all.
Throwing a function out of oneself / drawing conclusions based on a function that is currently being analyzed is most likely not able either, this is a rather specific case.
@etki, does this refer to java either to vm itself or to some JIT also?
Shipilev somehow dealt with such a case in his blog , where, among other things, he showed that scala can do tail optimization, and java - only inline the first call from the entire recursive chain, and then already in runtime.
and try to measure with a normal framework - jmh, for example.

Accepted Answer · 2017-08-21T17:18:36

I’ll just give assembler calculations in addition to the remote answer @Barmaley

 ; PerfTest.func1 (с2) 0x00007fc219111be0: mov DWORD PTR [rsp-0x14000],eax 0x00007fc219111be7: push rbp 0x00007fc219111be8: sub rsp,0x20 ;*synchronization entry ; - PerfTest::func1@-1 (line 16) 0x00007fc219111bec: mov ebp,esi 0x00007fc219111bee: dec ebp ;*iinc ; - PerfTest::func1@0 (line 16) 0x00007fc219111bf0: test ebp,ebp 0x00007fc219111bf2: jle 0x00007fc219111c04 ;*ifle ; - PerfTest::func1@4 (line 16) 0x00007fc219111bf4: add esi,0xfffffffe ;*iinc ; - PerfTest::func1@0 (line 16) ; - PerfTest::func1@8 (line 17) 0x00007fc219111bf7: test esi,esi 0x00007fc219111bf9: jle 0x00007fc219111c04 ;*ifle ; - PerfTest::func1@4 (line 16) ; - PerfTest::func1@8 (line 17) 0x00007fc219111bfb: mov DWORD PTR [rsp],esi 0x00007fc219111bfe: nop 0x00007fc219111bff: call 0x00007fc219046420 ; OopMap{off=36} ;*invokestatic func1 ; - PerfTest::func1@8 (line 17) ; - PerfTest::func1@8 (line 17) ; {static_call} 0x00007fc219111c04: mov eax,ebp 0x00007fc219111c06: add rsp,0x20 0x00007fc219111c0a: pop rbp 0x00007fc219111c0b: test DWORD PTR [rip+0x171c43ef],eax # 0x00007fc2302d6000 ; {poll_return} 0x00007fc219111c11: ret ;*invokestatic func1 ; - PerfTest::func1@8 (line 17) ; - PerfTest::func1@8 (line 17)

 ; PerfTest.func2 (с2) 0x00007fc9f110ed80: sub rsp,0x18 0x00007fc9f110ed87: mov QWORD PTR [rsp+0x10],rbp ;*synchronization entry ; - PerfTest::func2@-1 (line 21) 0x00007fc9f110ed8c: mov eax,esi 0x00007fc9f110ed8e: dec eax ;*iinc ; - PerfTest::func2@0 (line 21) 0x00007fc9f110ed90: test eax,eax 0x00007fc9f110ed92: jle 0x00007fc9f110eda2 0x00007fc9f110ed94: add esi,0xfffffffe 0x00007fc9f110ed97: test esi,esi 0x00007fc9f110ed99: jle 0x00007fc9f110eda0 0x00007fc9f110ed9b: mov eax,0x1 0x00007fc9f110eda0: dec eax ;*ifle ; - PerfTest::func2@4 (line 21) 0x00007fc9f110eda2: add rsp,0x10 0x00007fc9f110eda6: pop rbp 0x00007fc9f110eda7: test DWORD PTR [rip+0x16405253],eax # 0x00007fca07514000 ; {poll_return} 0x00007fc9f110edad: ret

What does this even show? Indeed, the recursion has not disappeared anywhere, although here it could be legally optimized - the call instruction is present in the first calculation, but not in the second, but if I interpret the addresses correctly, call doesn’t refer to the same code segment, but still some kind of proxy code, which is the same call instruction or (less likely) jmp is returned to the code from the calculation (most likely, this is the method call code of the class). The call instruction is an analog of the method call, it jumps to another part of the code, and besides the jump itself, you need to save, where to return, what state the processor had at that moment (to restore it later), set the correct stack and base pointer pointers, some of this is done automatically, part is a call convention and additional instructions are performed. All this is quite expensive operations compared to the rest of the code (a simple decrement, executed in one instruction), so that they have an impact on performance. In addition, jumping on the code provides additional opportunities for performance drop: if a small piece of code without recursion is almost guaranteed to get into the processor cache and does not load constantly from RAM, the code with recursion and code jumping can be forced out of the fast processor caches, which will force it will also be noticeable to spend time on getting it from slower caches or RAM as compared with a simple decrement of the register.

I note that this is, in fact, a very inefficient way to measure performance - in this mode you can only compare the number and set of instructions that are executed at different times. This approach is useful in order to get to the truth and low-level causes of brakes, but nothing guarantees that the JVM on one machine will not compile the code into other instructions. Specifically, in this case, I simply confirm the hypothesis that recursion has not gone away (not in the most efficient way) and give some ground for hypotheses, and indeed I really just wanted to play around with hsdis somehow. As a bonus, you can see that JIT JIT, and from somewhere in the calculations, the decrement and the test of the value are repeated - either PrintAssembly produces incorrect data, or operations for some reason are duplicated in the code.

PS Now someone is simply obliged to lay out and analyze baytkod

Comparing Recursion and Cycle Performance

1 answer 1

More articles: