Consider the expansion of the x86-64 architecture, a four-level hierarchy of virtual memory pages.
With a TLB miss, you will need to conduct additional readings from memory (PLM4, PDP, PD).
The question is how to get PTE without unnecessary memory calls (or reduce them)? How to optimize the process? The question is theoretical.