Fearless defense. Memory Security in Rust

Last year, Mozilla released Quantum CSS for Firefox, which culminated in eight years of development of Rust, a memory-safe system programming language. It took more than a year to rewrite the main component of the browser to Rust.

So far, all the major browser engines have been written in C ++, mostly for efficiency reasons. But with great performance comes a big responsibility: C ++ programmers have to manually manage the memory, which opens the Pandora's vulnerability box. Rust not only eliminates such errors, but its methods also prevent data races , allowing programmers to more effectively implement parallel code.

What is memory security

When we talk about creating secure applications, we often mention memory security. Unofficially, we mean that in no state can a program get access to invalid memory. Causes of security breaches:

saving the pointer after freeing memory (use-after-free);
null pointer dereference;
use of uninitialized memory;
program attempt to free the same cell twice (double-free);
buffer overflow.

For a more formal definition, see the article “What is safety of memory” by Michael Hicks, as well as a scientific article on this topic.

Such violations may lead to an unexpected failure or change in the intended behavior of the program. Potential consequences: information leakage, arbitrary code execution and remote code execution.

Memory management

Memory management is critical to application performance and security. In this section, we consider the basic memory model. One of the key concepts is pointers . These are variables that store memory addresses. If we go to this address, we will see there some data. Therefore, we say that the pointer is a link to (or points to) this data. Just as the home address tells people where to find you, the memory address shows the program where to find the data.

Everything in the program is located at specific memory addresses, including code instructions. Incorrect use of pointers can lead to serious vulnerabilities, including information leakage and the execution of arbitrary code.

Allocation / Release

When we create a variable, the program must allocate enough memory space to store the data of this variable. Since each process has a limited amount of memory, of course, we need a way to free up resources. When the memory is released, it becomes available for storing new data, but the old data lives there until the cell is overwritten.

Buffers

A buffer is a contiguous area of memory in which several instances of the same data type are stored. For example, the phrase "My cat - Batman" will be saved in a 16-byte buffer. Buffers are determined by starting address and length. In order not to damage the data in the neighboring memory, it is important to make sure that we do not read or write outside the buffer.

Control flow

Programs consist of subroutines that are executed in a specific order. At the end of the subroutine, the computer goes to the saved pointer to the next part of the code (which is called the return address ). When you go to the return address, one of three things happens:

The process continues normally (the return address is not changed).
The process crashes (the address is changed and indicates non-executable memory).
The process continues, but not as expected (the return address has changed and the control flow has been changed).

How languages provide memory security

All programming languages belong to different parts of the spectrum . On the one hand of the spectrum - languages such as C / C ++. They are effective, but require manual memory management. On the other hand, interpreted languages with automatic memory management (for example, reference counting and garbage collection (GC)), but they pay for with performance. Even languages with well-optimized garbage collection cannot compare in performance to languages without GC.

Manual memory management

Some languages (for example, C) require programmers to manually manage memory: when and how much to allocate memory, when to free it. This gives the programmer complete control over how the program uses resources, providing fast and efficient code. But this approach is error prone, especially in complex code bases.

Errors that are easy to make:

forget that resources are released and try to use them;
do not allocate enough space for data storage;
read memory out of buffer.

Appropriate safety instructions for manual memory managers.

Smart pointers

Smart pointers are provided with additional information to prevent memory misuse. They are used for automatic memory management and border checking. Unlike a regular pointer, a smart pointer is able to self-destruct and will not wait until the programmer deletes it manually.

There are various variants of such a construction that wraps the source pointer in several useful abstractions. Some smart pointers count references to each object, while others implement a scoping policy to limit the pointer's lifetime to certain conditions.

When link counting, resources are freed when the last object link is deleted. Basic reference counting implementations suffer from poor performance, increased memory consumption and are difficult to use in multi-threaded environments. If objects refer to each other (circular references), then reference counting for each object will never reach zero, so more complex methods are required.

Garbage collection

In some languages (for example, Java, Go, Python) garbage collection is implemented. The part of the runtime environment, called the garbage collector (GC), keeps track of variables and identifies unavailable resources in the link graph between objects. As soon as the object becomes unavailable, the GC frees the base memory for future reuse. Any allocation and release of memory occurs without an explicit programmer’s command.

Although GC ensures that memory is always used correctly, it does not free memory in the most efficient way — sometimes the last use of an object occurs much earlier than the garbage collector frees memory. Performance costs are prohibitive for critical applications: to avoid performance degradation, sometimes you have to use 5 times more memory.

Possession

In Rust, ownership is used to ensure high performance and memory security. More formally, this is an example of affine typing . All Rust code follows certain rules that allow the compiler to manage memory without losing runtime:

Each value has a variable called the owner.
Only one owner at a time.
When the owner goes out of scope, the value is removed.

Values can be transferred or borrowed from one variable to another. These rules apply part of the compiler called the borrow checker.

When a variable goes out of scope, Rust frees that memory. In the following example, the variables s1 and s2 are outside the region, both trying to free the same memory, which leads to a double-free error. To prevent this, when transferring a value from a variable, the previous owner becomes invalid. If the programmer then tries to use an invalid variable, the compiler will reject the code. This can be avoided by creating a deep copy of the data or using links.

Example 1 : Transfer of ownership

 let s1 = String::from("hello"); let s2 = s1; //won't compile because s1 is now invalid println!("{}, world!", s1);

Another set of rules borrow checker'a refers to the lifetime of variables. Rust prohibits the use of uninitialized variables and hanging pointers to non-existent objects. If you compile the code from the example below, r will refer to memory, which is released when x goes out of scope: a dangling pointer appears. The compiler keeps track of all areas and checks the admissibility of all hyphenation, sometimes requiring the programmer to explicitly indicate the lifetime of the variable.

Example 2 : Hanging Index

 let r; { let x = 5; r = &x; } println!("r: {}", r);

The ownership model provides a solid foundation for correct memory access, preventing undefined behavior.

Memory vulnerabilities

The main consequences of vulnerable memory:

Failure : access to invalid memory may lead to an unexpected application termination.
Information leakage : the inadvertent provision of private data, including confidential information, such as passwords.
Random Code Execution (ACE) : allows an attacker to execute arbitrary commands on the target machine. If this happens over the network, we call it remote code execution (RCE).

Another problem is a memory leak , when the allocated memory is not released after the program ends. So you can use up all the available memory: then requests for resources will be blocked, which will lead to a denial of service. This is a memory problem that cannot be solved at the level of PL.

At best, if a memory error occurs, the application crashes. In the worst case, the attacker will gain control of the program through vulnerability (which may lead to further attacks).

Abuse of released memory (use-after-free, double free)

This subclass of vulnerabilities occurs when a resource is released, but the link to its address is still preserved. This is a powerful hacker method that can lead to out-of-range access, information leakage, code execution and much more.

Languages with garbage collection and link counting prevent the use of invalid pointers, destroying only inaccessible objects (which can lead to poor performance), and manual languages are subject to this vulnerability (especially in complex code bases). The borrow checker tool in Rust does not allow to destroy objects as long as there are links to it, so these bugs are fixed at the compilation stage.

Uninitialized variables

If a variable is used before initialization, then there can be any data in this memory, including random garbage or previously discarded data, which leads to information leakage (sometimes referred to as invalid pointers ). To prevent these problems, in languages with memory management, an automatic initialization procedure is often used after memory allocation.

As in C, most of the variables in Rust are not initially initialized. But unlike C, you cannot read them before initialization. The following code will not compile:

Example 3 : Using an uninitialized variable

 fn main() { let x: i32; println!("{}", x); }

Null pointers

When an application dereferences a pointer, which turns out to be zero, it usually simply accesses the garbage and causes a crash. In some cases, these vulnerabilities could lead to the execution of arbitrary code ( 1 , 2 , 3 ). Rust has two types of pointers: links and raw pointers. Links are safe, but untreated pointers can be a problem.

Rust prevents null pointer dereferencing in two ways:

Avoids pointers that admit zero values.
Avoids dereferencing unprocessed pointers.

Rust avoids null pointers by replacing them with a special типом Option . To change the value of possibly-null in the Option type, the language requires the programmer to explicitly handle the case with a zero value, otherwise the program will not be compiled.

What to do if pointers admitting a zero value cannot be avoided (for example, when interacting with code in another language)? Try to isolate the damage. The dereferencing of unprocessed pointers should occur in an isolated unsafe block. It relaxes the rules of Rust and allows some operations that can cause undefined behavior (for example, dereferencing an unprocessed pointer).

“All that concerns a borrow chekcer ... but what about that dark place?”
- This is an unsafe block. Never go there Simba

Buffer overflow

We discussed vulnerabilities that can be avoided by limiting access to undefined memory. But the problem is that the buffer overflow incorrectly refers not to the indefinite, but to the legally allocated memory. Like the use-after-free bug, such access can be a problem because it accesses the freed memory, which still contains confidential information that should not already exist.

Buffer overflow simply means out-of-bounds. Because of how the buffers are stored in memory, they often lead to leaks of information that may contain sensitive data, including passwords. In more serious cases, ACE / RCE vulnerabilities are possible by rewriting the instruction pointer.

Example 4: Buffer overflow (C code)

 int main() { int buf[] = {0, 1, 2, 3, 4}; // print out of bounds printf("Out of bounds: %d\n", buf[10]); // write out of bounds buf[10] = 10; printf("Out of bounds: %d\n", buf[10]); return 0; }

The simplest buffer overflow protection is to always require border checking when accessing elements, but this leads to a decrease in performance .

What does Rust do? The built-in buffer types in the standard library require border checking for any random access, but also provide iterator APIs to speed up sequential calls. This ensures that reading and writing beyond the boundaries for these types is impossible. Rust promotes templates that require border checking only in places where you will almost certainly have to manually place them in C / C ++.

Memory security is only half the battle

Security breaches lead to vulnerabilities, such as data leakage and remote code execution. There are various ways to protect memory, including smart pointers and garbage collection. You can even formally prove memory safety . Although some languages have come to terms with the decline in performance for the sake of memory security, the concept of ownership in Rust provides security and minimizes overhead.

Unfortunately, memory errors are only part of the story when we talk about writing safe code. In the next article, we will look at thread safety and attacks on parallel code.

Exploiting Memory Vulnerabilities: Additional Resources

Source: https://habr.com/ru/post/438288/