📜 ⬆️ ⬇️

Open source: kodoyumor, kodotryuki, NOT codebred

Old GLib vs New Clang


Tinkering in a variety of open source software, I occasionally find all sorts of interesting things: sometimes it's just a funny comment, sometimes it's something witty in a broader sense. Such collections periodically appear in the “global Internet” and on Habré - there is, say, a well-known question on StackOverflow about comments in the code, and here a collection of funny names of legal entities and toponyms was recently published . I will try and structure and lay out what I gradually accumulated. Under the cut you are waiting for quotes from QEMU, the Linux kernel, and not only.


Linux kernel


I think for many it is not a secret that letters from the Linux Kernel Mailing List periodically disagree on quotes. So let's take a look at the code. Immediately, the kernel build system greets us with a surprise: as you know, projects compiled by Autoconf have a Makefile with two standard cleanup targets: clean and distclean . Naturally, the kernel is not built using Autoconf, and even then only menuconfig costs, so there are more goals here: clean , distclean and mrproper — yes, Mr. Proper, core cleaner twice as fast .


Speaking of the configuration system: I was surprised a long time ago when I stumbled upon it in addition to clear commands like allnoconfig , allyesconfig (I suspect that something strongly debugging can be compiled, so now I would not risk downloading this on real hardware .. .) and allmodconfig on the mysterious purpose of allrandconfig . “They're mocking me,” I thought, then told my acquaintance about this observation, to which he replied that it was probably quite a meaningful command, but not for actual assembly, but for testing the correctness of the dependencies between the options — as I said now, a fazzing of configuration parameters.


However, there is life in the core outside the assembly system: documentation is sometimes not only technical, but also a kind of artistic value. Suppose you want to warn hibernation users of its fragility and risk of data loss if certain rules are not followed. I would sadly write, they say ATTENTION: <substitute a couple of the most boring lines> . But the developer who wrote this acted differently:


 Some warnings, first. * BIG FAT WARNING ********************************************************* * * If you touch anything on disk between suspend and resume... * ...kiss your data goodbye. * * If you do resume from initrd after your filesystems are mounted... * ...bye bye root partition. * [this is actually same case as above] * * ... 

Little tricks


Not surprisingly, not every code can be compiled with optimizations: when I tried to force them on for all object files, I naturally ran into some source of entropy or something like that, which produced #error if optimization was turned on. Well, cryptography is like that. Do you want a code that does not compile if you turn off all optimizations, inlining, etc.? How is this possible? And this is such a static assert:


 /* SPDX-License-Identifier: GPL-2.0 */ // ... /* * This function doesn't exist, so you'll get a linker error if * something tries to do an invalidly-sized xchg(). */ extern void __xchg_called_with_bad_pointer(void); static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size) { unsigned long ret, flags; switch (size) { case 1: #ifdef __xchg_u8 return __xchg_u8(x, ptr); #else local_irq_save(flags); ret = *(volatile u8 *)ptr; *(volatile u8 *)ptr = x; local_irq_restore(flags); return ret; #endif /* __xchg_u8 */ // ... default: __xchg_called_with_bad_pointer(); return x; } } 

It is assumed, apparently, that for any use with a constant argument, this function will unfold into only one branch of the switch , and when used with a valid argument, this branch will not be default:
In a non-optimized form, this function will cause a linking error practically by design ...


Did you know



QEMU


Generally, when I read Robert Love about the Linux kernel device, and then I got into the QEMU source code, I had a certain feeling of deja vu. There were lists that are embedded in structures by value (and not like they learn in the initial programming course — via pointers), and a certain RCU subsystem (what it is, I did not fully understand, but it also exists in the kernel) and, probably a lot more similar.


What is the first thing that a neat person wants to work on on a project? Probably with a coding style. And already in this, one might say, ceremonial, document, we see:


 1. Whitespace Of course, the most important aspect in any coding style is whitespace. Crusty old coders who have trouble spotting the glasses on their noses can tell the difference between a tab and eight spaces from a distance of approximately fifteen parsecs. Many a flamewar has been fought and lost on this issue. 

Here is also about the perennial question about the maximum length of lines:


 Lines should be 80 characters; try not to make them longer. ... Rationale: - Some people like to tile their 24" screens with a 6x4 matrix of 80x24 xterms and use vi in all of them. The best way to punish them is to let them keep doing it. ... 

(Hmm ... It's twice as much on each axis than I sometimes use. Is it such a Linux HD?)


There is still a lot of interesting - read .


And again tricks


They say C is a low-level language. But if it is good to be perverted, it is possible to manifest the wonders of compile-time code generation without any Scala or even C ++.


For example, in the QEMU codebase, the file softmmu_template.h . When I saw this name, I thought that it was supposed to be copied into my implementation of the TCG backend and corrected until the correct implementation of the TLB was obtained. No matter how wrong! Here's how to use it correctly :


accel / tcg / cputlb.h:


 define DATA_SIZE 1 #include "softmmu_template.h" #define DATA_SIZE 2 #include "softmmu_template.h" #define DATA_SIZE 4 #include "softmmu_template.h" #define DATA_SIZE 8 #include "softmmu_template.h" 

As you can see, sleight of hand and no C ++. But this is a pretty simple example. How about something more complicated?


There is such a file: tcg / tcg-opc.h . Its content is rather mysterious and looks something like this:


 ... DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT) DEF(movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT) DEF(setcond_i32, 1, 2, 1, 0) DEF(movcond_i32, 1, 4, 1, IMPL(TCG_TARGET_HAS_movcond_i32)) /* load/store */ DEF(ld8u_i32, 1, 1, 1, 0) DEF(ld8s_i32, 1, 1, 1, 0) DEF(ld16u_i32, 1, 1, 1, 0) DEF(ld16s_i32, 1, 1, 1, 0) ... 

In fact, everything is very simple - it is used like this:


tcg / tcg.h:


 typedef enum TCGOpcode { #define DEF(name, oargs, iargs, cargs, flags) INDEX_op_ ## name, #include "tcg-opc.h" #undef DEF NB_OPS, } TCGOpcode; 

Or so:


tcg / tcg-common.c:


 TCGOpDef tcg_op_defs[] = { #define DEF(s, oargs, iargs, cargs, flags) \ { #s, oargs, iargs, cargs, iargs + oargs + cargs, flags }, #include "tcg-opc.h" #undef DEF }; 

It is even strange that there were no other use cases. And note, in this case there are no tricky scripts for code generation - only C, only hardcore.


Did you know



Java, JVM and all-all-all


What am I all about Linux? Let's talk about something cross-platform. About JVM, for example. Well, about GraalVM, probably, many developers in this ecosystem have already heard. If not heard, then in two words: it is epic. So, after the story of Graal, let's move on to the good old JVM.


Sometimes the JVM needs to stop all managed flows — the garbage collection stage is so intricate or something else — but luck, stop the flows only at so-called safepoints. As explained here , a normal global variable check takes a long time, including some shamanism with memory barriers. What did the developers do? They limited themselves to one reading of the variable.


Almost like in HQ9 +

There is such a joking language - HQ9 + . It was created as a "very convenient educational programming language", namely, it is very easy to perform typical tasks that are given to students:


  • on the 'H' command, the interpreter prints Hello, World!
  • on the command 'Q' prints the text of the program itself (quine)
  • on '9' he prints lyrics about 99 bottles of the beer
  • by 'i' it increases the variable i by one
  • he can't do anything else, but why? ..

How does the JVM with one instruction achieve the goal? And it's very simple - if you stop it, it removes the display for the memory page with this variable - the streams fall along SIGSEGV, and the JVM parks them and removes them from the pause when the “maintenance” ends. I remember on StackOverflow to the question from the interview How do you crash a JVM? answered:


JNI. In fact, with JNI, crashing is the default mode of operation. You have to work.

Jokes jokes, and sometimes in the JVM it really is.


Well, since I mentioned the code generation in Scala, and now we are talking about this ecosystem, here's an interesting fact for you: the code generation in Scala (the one that is macros) is approximately as follows: you write code on Scala using the API compiler, and compile it. Then the next time you start the compiler, you simply pass the resulting code generator to the compiler classpath itself, and the latter, upon seeing a special directive, calls it, passing the syntax trees received during the call. In response, he receives an AST, which must be substituted at the call site.


Features of licensing ideologies


I like the free software ideology, but it also has some funny features.


Once, about ten years ago, I updated my Debian stable and, thinking about the syntax of a command, I typed man <команда> in a familiar way, for which I received an exhaustive description like “[program name] is a program with licensed documentation GNU GFDL with unchangeable partitions, which is not DFSG-free. ” They say that this program was written by some evil proprietors from some FSF ... (Now the discussion is googling.)


And some small but important library is considered to be non-free software by some distributions, as the author added to the standard permissive license that this program should be used for good and not for evil . Laughing with laughter, and I, too, would probably be afraid of such a thing in production - is it not enough, what ideas about good and evil from the author.


Any different


Features of international compiler in the period of Moore's law


Severe LLVM developers have limited supported alignment:


The maximum alignment is 1 << 29.

As they say, it makes you laugh first, and then think : the first thought is that who needs alignment with 512 MiB. Then I read about the development of the kernel on Rust , and there they propose to make the structure “page table” aligned to 4096 bytes. And as you read Wikipedia, so there in general:


48-bit space for more than 512 GB of memory (for about 0.195% of the 256 TB virtual space).

Format version - how to store?


Once I decided to find out why export in one program does not work, but it turns out that it works ... Or not?


Having manually started backend commands, I realized that, in principle, everything is in order, just the version should be transferred as "2.0", and it simply leaves "2". Anticipating a trivial correction by editing the string constant, I discover the double getVersion() function double getVersion() - well, what, major is, minor is, even the point is! However, in the end everything was decided not much harder than expected, I just increased the accuracy of the output Forwarded the data type and forwarding the lines.


About the difference between theorists and practitioners


In my opinion, somewhere on Habré, I have already seen the translation of an article about what is the minimum C program that falls on startup, but still? int main; - the main symbol is, and technically , you can transfer control to it. sirikid correctly noted that even int bytes are superfluous here. In general, even speaking of a program of 9 bytes in size, it is better not to be scattered with statements that it is the smallest ... True, the program will fall, but the rules are fully consistent.


So, we can do what should work, but what about running the non-launch?


 $ ldd /bin/ls linux-vdso.so.1 (0x00007fff93ffa000) libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007f0b27664000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0b2747a000) libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f0b27406000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f0b27400000) /lib64/ld-linux-x86-64.so.2 (0x00007f0b278e9000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0b273df000) $ /lib/x86_64-linux-gnu/libc.so.6 

... and libc to him in a human voice :


 GNU C Library (Ubuntu GLIBC 2.28-0ubuntu1) stable release version 2.28. Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 8.2.0. libc ABIs: UNIQUE IFUNC ABSOLUTE For bug reporting instructions, please see: <https://bugs.launchpad.net/ubuntu/+source/glibc/+bugs>. 

Programmers play golf


There is a whole site on StackExchange dedicated to Code Golf - competitions with the style “Solve this problem with a minimum fine depending on the size of the source code”. The format itself involves very sophisticated solutions, but sometimes they become very sophisticated. Therefore, one of the questions was a collection of standard forbidden loopholes. I especially like this one:


Using MetaGolfScript
MetaGolfScript is a family of programming languages. For example, the program in MetaGolfScript-209180605381204854470575573749277224 prints "Hello, World!".

One line



Finally, where does the title of the article come from? This is a rephrased trick from the emcc compiler emcc from Emscripten :


 $ emcc --help ... emcc: supported targets: llvm bitcode, javascript, NOT elf (autoconf likes to see elf above to enable shared object support) 


Source: https://habr.com/ru/post/437832/