I’ll just give assembler calculations in addition to the remote answer @Barmaley
; PerfTest.func1 (с2) 0x00007fc219111be0: mov DWORD PTR [rsp-0x14000],eax 0x00007fc219111be7: push rbp 0x00007fc219111be8: sub rsp,0x20 ;*synchronization entry ; - PerfTest::func1@-1 (line 16) 0x00007fc219111bec: mov ebp,esi 0x00007fc219111bee: dec ebp ;*iinc ; - PerfTest::func1@0 (line 16) 0x00007fc219111bf0: test ebp,ebp 0x00007fc219111bf2: jle 0x00007fc219111c04 ;*ifle ; - PerfTest::func1@4 (line 16) 0x00007fc219111bf4: add esi,0xfffffffe ;*iinc ; - PerfTest::func1@0 (line 16) ; - PerfTest::func1@8 (line 17) 0x00007fc219111bf7: test esi,esi 0x00007fc219111bf9: jle 0x00007fc219111c04 ;*ifle ; - PerfTest::func1@4 (line 16) ; - PerfTest::func1@8 (line 17) 0x00007fc219111bfb: mov DWORD PTR [rsp],esi 0x00007fc219111bfe: nop 0x00007fc219111bff: call 0x00007fc219046420 ; OopMap{off=36} ;*invokestatic func1 ; - PerfTest::func1@8 (line 17) ; - PerfTest::func1@8 (line 17) ; {static_call} 0x00007fc219111c04: mov eax,ebp 0x00007fc219111c06: add rsp,0x20 0x00007fc219111c0a: pop rbp 0x00007fc219111c0b: test DWORD PTR [rip+0x171c43ef],eax # 0x00007fc2302d6000 ; {poll_return} 0x00007fc219111c11: ret ;*invokestatic func1 ; - PerfTest::func1@8 (line 17) ; - PerfTest::func1@8 (line 17)
; PerfTest.func2 (с2) 0x00007fc9f110ed80: sub rsp,0x18 0x00007fc9f110ed87: mov QWORD PTR [rsp+0x10],rbp ;*synchronization entry ; - PerfTest::func2@-1 (line 21) 0x00007fc9f110ed8c: mov eax,esi 0x00007fc9f110ed8e: dec eax ;*iinc ; - PerfTest::func2@0 (line 21) 0x00007fc9f110ed90: test eax,eax 0x00007fc9f110ed92: jle 0x00007fc9f110eda2 0x00007fc9f110ed94: add esi,0xfffffffe 0x00007fc9f110ed97: test esi,esi 0x00007fc9f110ed99: jle 0x00007fc9f110eda0 0x00007fc9f110ed9b: mov eax,0x1 0x00007fc9f110eda0: dec eax ;*ifle ; - PerfTest::func2@4 (line 21) 0x00007fc9f110eda2: add rsp,0x10 0x00007fc9f110eda6: pop rbp 0x00007fc9f110eda7: test DWORD PTR [rip+0x16405253],eax # 0x00007fca07514000 ; {poll_return} 0x00007fc9f110edad: ret
What does this even show? Indeed, the recursion has not disappeared anywhere, although here it could be legally optimized - the call instruction is present in the first calculation, but not in the second, but if I interpret the addresses correctly, call doesn’t refer to the same code segment, but still some kind of proxy code, which is the same call instruction or (less likely) jmp is returned to the code from the calculation (most likely, this is the method call code of the class). The call instruction is an analog of the method call, it jumps to another part of the code, and besides the jump itself, you need to save, where to return, what state the processor had at that moment (to restore it later), set the correct stack and base pointer pointers, some of this is done automatically, part is a call convention and additional instructions are performed. All this is quite expensive operations compared to the rest of the code (a simple decrement, executed in one instruction), so that they have an impact on performance. In addition, jumping on the code provides additional opportunities for performance drop: if a small piece of code without recursion is almost guaranteed to get into the processor cache and does not load constantly from RAM, the code with recursion and code jumping can be forced out of the fast processor caches, which will force it will also be noticeable to spend time on getting it from slower caches or RAM as compared with a simple decrement of the register.
I note that this is, in fact, a very inefficient way to measure performance - in this mode you can only compare the number and set of instructions that are executed at different times. This approach is useful in order to get to the truth and low-level causes of brakes, but nothing guarantees that the JVM on one machine will not compile the code into other instructions. Specifically, in this case, I simply confirm the hypothesis that recursion has not gone away (not in the most efficient way) and give some ground for hypotheses, and indeed I really just wanted to play around with hsdis somehow. As a bonus, you can see that JIT JIT, and from somewhere in the calculations, the decrement and the test of the value are repeated - either PrintAssembly produces incorrect data, or operations for some reason are duplicated in the code.
PS Now someone is simply obliged to lay out and analyze baytkod