Where is the fast run empty cycle?

Question

the same cycle written as a kernel module and as a normal user mode application, will the speed be the same or not?

An empty loop will be thrown by a good optimizer when compiling.
@VladD And what will work faster if you compare the netfilter hook, which replaces the destination address and any proxy server?
Hmm, this somehow somehow somehow deviates from the original topic of the question.
I do not think that you can just talk about the speed of an arbitrary proxy server, depending on the implementation details.
@Vladimir correct your question so that it does not sound so delusional.

Mike mike 38.7k one 25 62 · Accepted Answer · 2016-05-10T21:25:28

Why are you asking about a certain cycle if you are interested in something else. A processor with the same speed will execute the same code in both the kernel and user space. Provided that this code and all data are present in RAM.

Further answer is based on your comments on the question ...

The only question is what kind of code should be executed and whether it is necessary to switch the context. In order for the package to end up in the user space in the proxy server and the following response should be sent:

The kernel receives a hardware interrupt from the network card. The driver and other functions of the network stack form the skbuf structure.
IP Stack understands that the package should get to the application on this machine in user space. The entire packet is copied from PL0 (core) memory to PL3 (process) memory.
The context switch to PL3 (user space) occurs. The operation itself is pretty expensive.
The proxy code parses the packet, decides that the request should be sent further, forms the packet being sent in its memory, calls the kernel's send function.
Context switch to PL0 (core)
The kernel copies the entire packet from PL3 to PL0.
The kernel adds TCP / IP / physical headers and sends it to the driver for sending.

When the code for forwarding a packet in the kernel is working, items 2 through 6 are not needed, including double copying of the packet in memory and two context switches. Due to this, the operation exclusively in the kernel is much faster. But the user space application can go not by the standard way, but by processing packets, for example, using netmap tools, which at least allow you to avoid copy operations by mapping kernel structures directly into the virtual memory of the PL3 process.

PS Memory protection levels are given for the intel x86 architecture. For other architectures, they differ with the general sense of the work required.

Thank you very much for your answers, they were very helpful in mastering the programming of the Linux kernel module.
Why is another question in the comment, I did not want to produce a similar topic.

Where is the fast run empty cycle?

1 answer 1

More articles: