📜 ⬆️ ⬇️

Swift compiler device. Part 3


We continue to study the Swift compiler. This part is dedicated to Swift Intermediate Language.


If you have not seen the previous ones, I recommend to follow the link and read:



Silgen


The next step is to convert typed AST to raw SIL. Swift Intermediate Language (SIL) is an intermediate representation specially created for Swift. A description of all instructions can be found in the documentation .


SIL has an SSA form. Static Single Assignment (SSA) is a representation of a code in which each variable is assigned a value only once. It is created from ordinary code by adding additional variables. For example, using a numeric suffix that indicates the version of the variable after each assignment.


Thanks to this form it is easier for the compiler to optimize the code. Below is an example in pseudocode. Obviously, the first line is unnecessary:


a = 1 a = 2 b = a 

But this is only for us. To teach the compiler to define it, it would be necessary to write non-trivial algorithms. But using SSA is much easier. Now, even for a simple compiler, it will be obvious that the value of the a1 variable is not used, and this line can be deleted:


 a1 = 1 a2 = 2 b1 = a2 

SIL allows you to apply specific optimizations and checks to the Swift code that would be difficult or impossible to implement at the AST stage.


Using SIL Generator


To generate SIL, the -emit-silgen flag is used:


 swiftc -emit-silgen main.swift 

The result of the command:


 sil_stage raw import Builtin import Swift import SwiftShims let x: Int // x sil_global hidden [let] @$S4main1xSivp : $Int // main sil @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>) -> Int32 { bb0(%0 : $Int32, %1 : $UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>): alloc_global @$S4main1xSivp // id: %2 %3 = global_addr @$S4main1xSivp : $*Int // user: %8 %4 = metatype $@thin Int.Type // user: %7 %5 = integer_literal $Builtin.Int2048, 16 // user: %7 // function_ref Int.init(_builtinIntegerLiteral:) %6 = function_ref @$SSi22_builtinIntegerLiteralSiBi2048__tcfC : $@convention(method) (Builtin.Int2048, @thin Int.Type) -> Int // user: %7 %7 = apply %6(%5, %4) : $@convention(method) (Builtin.Int2048, @thin Int.Type) -> Int // user: %8 store %7 to [trivial] %3 : $*Int // id: %8 %9 = integer_literal $Builtin.Int32, 0 // user: %10 %10 = struct $Int32 (%9 : $Builtin.Int32) // user: %11 return %10 : $Int32 // id: %11 } // end sil function 'main' // Int.init(_builtinIntegerLiteral:) sil [transparent] [serialized] @$SSi22_builtinIntegerLiteralSiBi2048__tcfC : $@convention(method) (Builtin.Int2048, @thin Int.Type) -> Int 

SIL, like LLVM IR, can be output as source code. You can find in it that at this stage Swift import of Builtin, Swift and SwiftShims modules was added.


Despite the fact that you can write code in Swift directly in the global scope, SILGen generates the main function - the entry point to the program. All code was located inside it, except for declaring a constant, since it is global and should be accessible everywhere.


Most of the lines have a similar structure. On the left is a pseudo-register, which stores the result of the instruction. Then - the instruction itself and its parameters, and at the end - a comment indicating the register for which calculation this register will be used.


For example, this line creates an integer literal of type Int2048 and a value of 16. This literal is saved in the fifth register and will be used to calculate the value of the seventh:


 %5 = integer_literal $Builtin.Int2048, 16 // user: %7 

The function declaration begins with the keyword sil. Further the name with prefix @, calling convention, parameters, type of return value and function code is indicated. For the initializer Int.init (_builtinIntegerLiteral :) it is, of course, not specified, since this function is from another module, and it needs only to be declared, but not defined. The dollar symbol indicates the start of the type indication:


 // Int.init(_builtinIntegerLiteral:) sil [transparent] [serialized] @$SSi22_builtinIntegerLiteralSiBi2048__tcfC : $@convention(method) (Builtin.Int2048, @thin Int.Type) -> Int 

Calling convention points out how to properly call a function. This is necessary to generate machine code. A detailed description of these principles is beyond the scope of the article.


The name of initializers, like the names of structures, classes, methods, protocols, are distorted (name mangling). This solves several problems at once.


First, it allows you to use the same name in different modules and nested entities. For example, for the first fff method, the name S4main3AAAV3fffSiyF is used , and for the second method, S4main3BBV3fffSiyF is used :


 struct AAA { func fff() -> Int { return 8 } } struct BBB { func fff() -> Int { return 8 } } 

S means Swift, 4 is the number of characters in the module name, and 3 in the class name. In the initializer, the literal Si denotes the standard type Swift.Int.


Secondly, names and types of function arguments are added to the name. This allows overloading. For example, for the first method, S4main3AAAV3fff3iiiS2i_tF is generated, and for the second method, S4main3AAAV3fff3dddSiSd_tF is generated :


 struct AAA { func fff(iii internalName: Int) -> Int { return 8 } func fff(ddd internalName: Double) -> Int { return 8 } } 

After the parameter names, the type of the return value is indicated, followed by the parameter types. However, their internal names are not specified. Unfortunately, there is no documentation for name mangling in Swift, and its implementation may change at any time.


The function name is followed by its definition. It consists of one or more basic block. The base unit is a sequence of instructions with one entry point, one exit point, which does not contain branch instructions or conditions for early exit.


The main function has one base unit that accepts all parameters passed to the function and contains all its code, since there are no branches in it:


 bb0(%0 : $Int32, %1 : $UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>): 

We can assume that each field of view, bounded by curly brackets, is a separate base unit. Suppose the code contains a branch:


 // before if 2 > 5 { // true } else { // false } // after 

In this case, at least 4 basic blocks will be generated for:



cond_br is an instruction for conditional branching . If the value of the pseudo-register% 14 is true, then the transition to the bb1 block is performed . If not, then in bb2 . br is an unconditional jump that triggers the execution of the specified base unit:


 // before cond_br %14, bb1, bb2 // id: %15 bb1: // true br bb3 // id: %21 bb2: // Preds: bb0 // false br bb3 // id: %27 bb3: // Preds: bb2 bb1 // after 

Source:



SIL guaranteed transformations


The raw intermediate representation, which was obtained at the previous stage, is analyzed for correctness and transformed into canonical: functions marked transparent are inline (the function call is replaced by its body), the values ​​of constant expressions are calculated, the functions that return values ​​are calculated do it in all branches of code, and so on.


These conversions are mandatory and are performed even if code optimization is disabled.


Generating canonical SIL


To generate the canonical SIL, the -emit-sil flag is used:


 swiftc -emit-sil main.swift 

The result of the command:


 sil_stage canonical import Builtin import Swift import SwiftShims let x: Int // x sil_global hidden [let] @$S4main1xSivp : $Int // main sil @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>) -> Int32 { bb0(%0 : $Int32, %1 : $UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>): alloc_global @$S4main1xSivp // id: %2 %3 = global_addr @$S4main1xSivp : $*Int // user: %6 %4 = integer_literal $Builtin.Int64, 16 // user: %5 %5 = struct $Int (%4 : $Builtin.Int64) // user: %6 store %5 to %3 : $*Int // id: %6 %7 = integer_literal $Builtin.Int32, 0 // user: %8 %8 = struct $Int32 (%7 : $Builtin.Int32) // user: %9 return %8 : $Int32 // id: %9 } // end sil function 'main' // Int.init(_builtinIntegerLiteral:) sil public_external [transparent] [serialized] @$SSi22_builtinIntegerLiteralSiBi2048__tcfC : $@convention(method) (Builtin.Int2048, @thin Int.Type) -> Int { // %0 // user: %2 bb0(%0 : $Builtin.Int2048, %1 : $@thin Int.Type): %2 = builtin "s_to_s_checked_trunc_Int2048_Int64"(%0 : $Builtin.Int2048) : $(Builtin.Int64, Builtin.Int1) // user: %3 %3 = tuple_extract %2 : $(Builtin.Int64, Builtin.Int1), 0 // user: %4 %4 = struct $Int (%3 : $Builtin.Int64) // user: %5 return %4 : $Int // id: %5 } // end sil function '$SSi22_builtinIntegerLiteralSiBi2048__tcfC' 

In this simple example, there are few changes. To see the real work of the optimizer, you need to complicate the code a bit. For example, add addition:


 let x = 16 + 8 

In his raw SIL, you can find the addition of these literals:


 %13 = function_ref @$SSi1poiyS2i_SitFZ : $@convention(method) (Int, Int, @thin Int.Type) -> Int // user: %14 %14 = apply %13(%8, %12, %4) : $@convention(method) (Int, Int, @thin Int.Type) -> Int // user: %15 

And in the canonical it is no longer. Instead, the constant value 24 is used:


 %4 = integer_literal $Builtin.Int64, 24 // user: %5 

Source:



SIL optimization


Additional Swift-specific transformations are applied if optimization is enabled. Among them, specialization of generics (optimization of a generic code for a specific type of parameter), devirtualization (replacement of dynamic calls with static calls), inlayning, optimization of ARC, and much more. The explanation of these techniques does not fit into the already overgrown article.


Source:



Since SIL is a Swift feature, I did not show examples of implementation this time. We will return to the parenthesis compiler in the next section, when we will be engaged in the generation of LLVM IR.



Source: https://habr.com/ru/post/438696/