Automated Smart Contract Audit Guide. Part 1: Preparing for an Audit

Introduction

Our company is engaged in auditing the security of smart contracts, and the use of automated tools is very serious. How much they can help in identifying suspicious places, what is worth using them, what they can do and what are the specifics of work in this area? These and related questions are the subject of this article. And the material will be attempts to work with real contracts with the help of the most interesting representatives and recipes for the launch of this extremely disparate and wildly interesting software. At first I wanted to make one article, but after some time the amount of information became too large, so it was decided to make a cycle of articles, one for each autoanalyzer. A list from which we will take tools is presented, for example, here , but if during the writing other interesting tools are found, I will gladly describe and test them.

I must say that the audit tasks turned out to be extremely interesting, since so far, developers have not paid much attention to the economic aspects of algorithms, and internal optimization. And the audit of smart contracts added some interesting attack vectors that need to be considered when searching for errors. Also, as it turned out, quite a lot of tools for automatic testing appeared: static analyzers, bytecode analyzers, fuzzers, parsers and many other good software.

The purpose of the article: to promote the spread of secure contract codes and allow developers to quickly and easily get rid of the stupid bugs that are often the most offensive. When the protocol itself is completely reliable, and solves a serious problem, the presence of a stupid mistake, forgotten during the testing phase, can seriously ruin a project’s life. Therefore, let us learn how to use, at a minimum, tools that allow “little blood” to get rid of well-known problems.

Looking ahead, I must say that the most frequently encountered critical bugs that we encountered in audits are still logical implementation problems, and not typical vulnerabilities, such as access rights, integer overflow, reentrancy. A large, full audit of solutions is impossible without experienced developers who are able to audit the high-level logic of contracts, their lifecycle, aspects of actual operation and compliance with the task, and not just typical attack patterns. It is high-level logic that often becomes a source of critical bugs.

But warnings, typical holes and mistakes left out of carelessness that should not be missed - the lot of automatic analyzers, they should cope with these tasks better than people. It is this thesis and will be subjected to verification.

Features of the audit of the code of smart contracts

Smart contract code auditing is a fairly specific area. Despite its small size, a smart contract in Ethereum is a full-fledged program, capable of organizing complex branches, cycles, decision trees, and even to automate seemingly simple transactions require thinking through all possible branches at every step. From this point of view, blockchain development is extremely low-level, very resource-intensive and extremely reminiscent of the development of system and embedded software in C / C ++ and assembly languages. That is why we love to see at interviews of developers of low-level algorithms, network stack, high-load services, all who dealt with low-level optimization and code auditing.

From the developer's point of view, Solidity is also quite specific, although it is easy to read by almost any programmer and in the first steps it seems extremely simple. Solidity code is fairly easy to read; it is familiar to any developer who owns C / C ++ syntax and OOP, such as JavaScript.

Here, the simplicity of the code is the key to survival, nothing heavy works, so the whole arsenal of low-level development is used in the work — algorithms that allow efficient use of resources and save memory: Merkle trees, Bloom filters, “lazy” loading resources, expanding cycles, manual garbage collection and much more.
A small amount of source code and the resulting bytecode.

A separate smart contract is limited by the volume of the byte-code, each byte costs some amount of gas, and the maximum is limited from above, so you can push about 10Kb into the blockchain (at the moment), it will not work anymore. Here's a good article on how much it costs to deploy a contract and how much gas it costs . Therefore, a lot of stuff will not succeed. If you exaggerate, then a few thousand lines of “average” code is the maximum. Several dozen methods, the absence of aggregation and generally complex logic is extremely characteristic of contracts. All that does not fit, requires you to isolate the code in separate libraries, change and complicate the order of calculations in the network. Solidity developers may be happy to shove a bunch of code into one contract, but simply have to arrange their contract systems correctly, creating separate library classes with their own storage. And such separate “classes” are conveniently laid out in separate files, and, therefore, it is quite pleasant to read the contract code, everything is quite well structured initially - otherwise it will not work. As an example, I recommend to look at how ERC721 is made in openzeppelin-solidity .

Gas, gas, gas

Gas introduces an additional layer of logic into the execution of the contract code, which requires an audit. Moreover, unlike the traditional code, one and the same code segment can spend different amounts of gas. A table of EVM opcodes and their cost is useful for understanding gas limitations, here it is .

To demonstrate why it is necessary to devote a lot of time to evaluating gas, consider such a piece of pseudo code (of course unreal, bullet in the cycle with ether is a bad idea):

// функция просто записывает код события для аккаунта в блокчейн function fixSomeAccountAction(uint _actionId) public onlyValidator { // … events[msg.sender].push(_actionId); } // юзер дергает функцию, которая суммирует награды за каждый тип действия и выплачивает награду function receivePaymentForSavedActions() { // ... for (uint256 i = 0; i < events[msg.sender].length; i++) { // берем actionId из массива uint actionId = events[msg.sender][i]; // вычсляем награду за данный вид action uint payment = getPriceByEventId(actionId); if (payment > 0) { paymentAccumulators[msg.sender] += payment; } emit LogEventPaymentForAction(msg.sender, actionId, payment); // … // delete “events[msg.sender][i]” from array } }

the fact is that the cycle in the contract is executed events [msg.sender] .length times, and each iteration is an entry in the blockchain (transfer () and emit ()). If the length of the array is small, then the cycle works a dozen times, handing out payment for each action. But, if the array of events [msg.sender] is large, then there will be a lot of iterations and the gas spent will be rested against the hard-boiled maximum gas limit (~ 8,000,000). The transaction will fall, and now it will never work out, since there is no way to reduce the length of the array events [msg.sender] in the contract. If a cycle does not just calculate a single unit value, but writes to the blockchain (for example, some commissions, payments for actions are paid), then the allowed number of iterations is quite substantially limited. Judge for yourself - limit: 8 000 000, record of new 256-bit value: 20 000. Ie you can save or update metadata only for a couple of hundreds of 256-bit addresses with some metadata. Another fun thing is that the new value entry: 20,000, and the update of the existing one: 5,000, so even with your contract environment completely identical, when you make a transfer Tokens for an address where there are already tokens, you spend on recording 4 times less gas (5,000 vs 20,000).

Therefore, do not be surprised that the issue of gas in smart contracts is so closely related to the security of contracts, because the situation when funds are stuck in a contract for all practical purposes is not much different from the situation when they were stolen. The fact that the ADD instruction costs 3 gas, and SSTORE (saving to storage): 20,000 says that the most expensive resource in the blockchain is storage, and the problems of contract code optimization are in many ways aligned with the low-level development tasks in C and ASM for embedded. systems, where storage is also a very limited resource.

Beautiful blockchain

This is a very positive paragraph about why the blockchain is so good from a security point of view for the auditor. Determinism of execution of a contract code is the key to successful debugging and reproducing bugs and vulnerabilities. Technically, any call to a contract code can be replicated on any platform up to a bit, it allows tests to work everywhere and be extremely simple to support, and incident investigation reliable and undeniable. Now we always know who called which function, with which parameters, which code processed it, and what the result was. All this is completely deterministic, i.e. reproduced anywhere, even in JS on a web page. If we talk about Ethereum, then any test case is extremely easy to write in convenient JavaScript, including fuzzing parameters, and works great anywhere where there are Node.js.

All these beautiful words, however, should not relax the developers, because, as mentioned above, the most serious mistakes are logical, and for them the determinism of execution is an orthogonal property.

Contract Build Environment

To write an article, I took an old experimental contract for booking housing from the Smartz designer: https://github.com/smartzplatform/constructor-eth-booking . The contract allows you to create a record of the object (apartment or hotel room), set the price and the date of delivery, after which the contract waits for payment and, if received, records the booking act, keeping the funds on the balance until the guest enters the room and will not confirm entry. At this point, the owner of the number gets paid. The contract is essentially a state machine, the state and transitions of which can be viewed on Booking.sol. We did it pretty quickly, changed the development process and did not have time to do a lot of tests, it is far from a new version of the compiler and more or less rich internal logic. So let's see how the analyzers deal with it, what errors they find, and, if necessary, add our own.

Work with different versions of solc

Different analyzers will have to be used differently - some are launched from the docker, others use ready-made compiled byte-code, and the auditor itself also has to deal not with a pair, but with dozens of contracts with different versions of the compiler. Therefore, different versions of solc need to be able to “podsovyvat” in different ways, and in the host system, and inside the docker-image, and inside the truffle, so I’ll give these several options for dirty hacks:

1 way: inside the truffle

For this, no tricks are needed, because since truffle version 5.0.0, you can specify the version of the compiler directly in truffle.js, as in this diff .

Now truffle will download the necessary compiler itself and run it. Many thanks to the team for this, Solidity is young, changes in the language are serious, and moving from version to version for the auditor is unacceptable - this way you can make new mistakes and disguise the old ones.

2 way: replacing / usr / bin / solc in the analyzer docker container
If the analyzer is distributed as a Dockerfile, you can replace it when building a docker image by adding a line to the Dockerfile that takes the solc required version directly from the image, which pulls out of the network and replaces / usr / bin / solc:

 COPY --from=ethereum/solc:0.4.19 /usr/bin/solc /usr/bin

3 way: replacing / usr / bin / solc

The dirtiest way to the forehead, if there is no way out at all, you can despicably replace the / usr / bin / solc binary with a script like this (remember to save the original file):

 #!/bin/bash # run Solidity compiler of given version, pass all parameters # you can run “SOLC_DOCKER_VERSION=0.4.20 solc --version” SOLC_DOCKER_VERSION="${SOLC_DOCKER_VERSION:-0.4.24}" docker run \ --entrypoint "" \ --tmpfs /tmp \ -v $(pwd):/project \ -v $(pwd)/node_modules:/project/node_modules \ -w /project \ ethereum/solc:$SOLC_DOCKER_VERSION \ /usr/bin/solc \ "$@"

It downloads and caches the docker image with the correct solc version, switches to the current directory, and starts /usr/bin/solc with the parameters passed. Not a very good way, but perhaps for some tasks, it will suit you.

Flattening code

Now we will deal with source codes. Of course, in theory, autoanalyzers (especially for static source code analysis) should collect a contract, pull up all dependencies, put everything together in one monolith and analyze it. But, as I have already said, changes from version to version can be serious, and I constantly stumbled upon the need to add an additional directory to the docker, configure it inside the path, and all this so that he correctly pulled up the necessary imports. Some analyzers understand everything, the latter are not, therefore, it is more convenient for a analyzer to eat a single file to merge everything into one file and analyze only it, as a universal option so that you don’t have to worry.

For this, use a regular truffle-flattener .

This is the standard npm module, used very simply:

 truffle-flattener contracts/Booking.sol > contracts/flattened.sol

: https://github.com/trailofbits/slither
If you need to customize flattening somehow, you can write your flattener, for example, before we used the python-based option: https://github.com/mixbytes/solidity-flattener

Let's start the analysis.

Using the example of the old man https://github.com/smartzplatform/constructor-eth-booking, we will continue the analysis. The contract has an old version of the compiler “0.4.20”, and I deliberately took the old contract to solve problems with the compiler. Worse the situation is that the auto-analyzer, for example, studying byte-code, may depend on this version of solc, and here the discrepancies in the versions can greatly affect the results or even break everything. so if even if you are doing everything kosher, using the latest versions, you can still fly to the analyzer, sharpened by the previous version of the compiler.
Compiling and running tests

To get started, just pull the project from the github and try to compile:

 git clone https://github.com/smartzplatform/constructor-eth-booking.git cd constructor-eth-booking npm install truffle compile

Surely you have a problem with the compiler version. And there are also problems with autoanalyzers, so use any means to get the compiler 0.4.20 and build the project. I just registered the correct version of the compiler in truffle.js and it all came together as above.

Also run

 truffle-flattener contracts/Booking.sol > contracts/flattened.sol

as mentioned in the paragraph about flattening, we will be giving contracts/flattened.sol to different analyzers for analysis
Conclusion to the introductory part

Now, having flattened.sol and the ability to use a solc arbitrary version, you can begin the analysis. I will omit the problems with running truffle and tests, there is a lot of documentation on this subject, sort it out yourself. Of course, tests must be run and successfully run. Also, in order to check the logic, the auditor often has to add his own tests, checking potentially leaky places, for example, checking the functionality of a contract at the boundaries of arrays, covering all variables, even those strictly intended for data storage, with tests, etc. There are many recommendations here, and besides this is just the product that our company supplies to the market, so the study of logic is a purely human task.

We will go to analyzers that are interesting from our point of view, try to slip our contract with them, and artificially introduce vulnerabilities into it in order to evaluate how autoanalyzers react to them. The next article will be devoted to the Slither analyzer, but in general, the action plan is approximately as follows:

Part 1. Introduction. Compilation, flattening, Solidity versions (this article)
Part 2. Slither
Part 3. Mythril
Part 4. Manticore
Part 5. Echidna
Part 6. Unknown tool 1
Part 7. Unknown tool 2

Such a set of analyzers turned out because it is important for an auditor to be able to use different types of analysis - static and dynamic, and for their implementation completely different approaches are required. Our task is to learn how to use the basic tools in each type of analysis and understand which one to use.

Perhaps in the process of detailed research, there will be new candidates for consideration, or the order of articles will change, so stay tuned. To go to the next part, click here.

Source: https://habr.com/ru/post/438336/