Validation of addresses in the memory on the Cortex-M0 / M3 / M4 / M7

Hi, Habr!

Regarding the recent relaxation of the regime , disturbances in the comments of one neighboring post that the articles about microcontrollers are completely blinking by the LED, as well as the untimely death of my standard blog, I’m still lazy to restore, I will transfer here useful material about one regrettably poorly lit Press trick in working with cores Cortex-M - checking arbitrary addresses for validity.

One of the very useful, and for some reason, finished features of the never described capabilities on Cortex-M microcontrollers (all) is the ability to verify the correctness of the address in memory. With its help, you can determine the size of the flash, RAM and EEPROM, determine the presence of specific peripherals and registers on a specific processor, beat down fallen processes while maintaining the overall operating system of the OS, etc.

In normal mode, if you hit a non-existent address on Cortex-M3 / M4 / M7, a BusFault exception is called, and in the absence of its handler, it is escalated to HardFault. There are no “detailed” exceptions (MemFault, BusFault, UsageFault) on Cortex-M0, and any failures are immediately escalated to HardFault.

In general, HardFault cannot be ignored - it can be a consequence of a hardware failure, for example, and further device behavior will become unpredictable. But in the particular case, this can and should be done.

Cortex-M3 and Cortex-M4: unimplemented BusFault

On Cortex-M3 and above, checking the validity of the address is quite simple - all exceptions (except, obviously, nonmaskable) must be disabled via the FAULTMASK register, specifically, the BusFault processing is disabled, and then poked into the address being checked and see if the BFARVALID flag in the BFAR register has risen , that is, Bus Fault Address Register. If you soared - you just had a BusFault, i.e. address is incorrect.

The code looks like this, all defaults and functions from the standard (non-vendor) CMSIS, so it should work on any M3, M4 or M7:

bool cpu_check_address(volatile const char *address) { /* Cortex-M3, Cortex-M4, Cortex-M4F, Cortex-M7 are supported */ static const uint32_t BFARVALID_MASK = (0x80 << SCB_CFSR_BUSFAULTSR_Pos); bool is_valid = true; /* Clear BFARVALID flag by writing 1 to it */ SCB->CFSR |= BFARVALID_MASK; /* Ignore BusFault by enabling BFHFNMIGN and disabling interrupts */ uint32_t mask = __get_FAULTMASK(); __disable_fault_irq(); SCB->CCR |= SCB_CCR_BFHFNMIGN_Msk; /* probe address in question */ *address; /* Check BFARVALID flag */ if ((SCB->CFSR & BFARVALID_MASK) != 0) { /* Bus Fault occured reading the address */ is_valid = false; } /* Reenable BusFault by clearing BFHFNMIGN */ SCB->CCR &= ~SCB_CCR_BFHFNMIGN_Msk; __set_FAULTMASK(mask); return is_valid; }

Cortex-M0 and Cortex-M0 +

With Cortex-M0 and Cortex-M0 + it’s getting harder, as I said above, they don’t have BusFault and all the relevant registers, and exceptions immediately escalate to HardFault. Therefore, there is only one way out - to make it so that the HardFault handler could understand that the exception was caused intentionally, and to return back to the function that called it, passing there a certain flag indicating that HardFault was.

This is done purely in assembler. In the example below, register R5 is set to 1, and two “magic numbers” are written to registers R1 and R2. If, after trying to load the value at the checked address, HardFault happens, then it should check the values of R1 and R2, and if the necessary numbers are found in them, set R5 to zero. In the sish code, the value of R5 is passed through a special variable tightly bound to this register, to the assembler the address being checked is in an implicit form, we just know that in arm-none-eabi the first parameter of the function is put in R0.

 bool cpu_check_address(volatile const char *address) { /* Cortex-M0 doesn't have BusFault so we need to catch HardFault */ (void)address; /* R5 will be set to 0 by HardFault handler */ /* to indicate HardFault has occured */ register uint32_t result __asm("r5"); __asm__ volatile ( "ldr r5, =1 \n" /* set default R5 value */ "ldr r1, =0xDEADF00D \n" /* set magic number */ "ldr r2, =0xCAFEBABE \n" /* 2nd magic to be sure */ "ldrb r3, [r0] \n" /* probe address */ ); return result; }

The HardFault handler code in its simplest form looks like this:

 __attribute__((naked)) void hard_fault_default(void) { /* Get stack pointer where exception stack frame lies */ __asm__ volatile ( /* decide if we need MSP or PSP stack */ "movs r0, #4 \n" /* r0 = 0x4 */ "mov r2, lr \n" /* r2 = lr */ "tst r2, r0 \n" /* if(lr & 0x4) */ "bne use_psp \n" /* { */ "mrs r0, msp \n" /* r0 = msp */ "b out \n" /* } */ " use_psp: \n" /* else { */ "mrs r0, psp \n" /* r0 = psp */ " out: \n" /* } */ /* catch intended HardFaults on Cortex-M0 to probe memory addresses */ "ldr r1, [r0, #0x04] \n" /* read R1 from the stack */ "ldr r2, =0xDEADF00D \n" /* magic number to be found */ "cmp r1, r2 \n" /* compare with the magic number */ "bne regular_handler \n" /* no magic -> handle as usual */ "ldr r1, [r0, #0x08] \n" /* read R2 from the stack */ "ldr r2, =0xCAFEBABE \n" /* 2nd magic number to be found */ "cmp r1, r2 \n" /* compare with 2nd magic number */ "bne regular_handler \n" /* no magic -> handle as usual */ "ldr r1, [r0, #0x18] \n" /* read PC from the stack */ "add r1, r1, #2 \n" /* move to the next instruction */ "str r1, [r0, #0x18] \n" /* modify PC in the stack */ "ldr r5, =0 \n" /* set R5 to indicate HardFault */ "bx lr \n" /* exit the exception handler */ " regular_handler: \n" /* here comes the rest of the fucking owl */ )

When the exception handler goes to the handler, Cortex drops the registers, which are guaranteed to be corrupted by the handler (R0-R3, R12, LR, PC ...), onto the stack. The first fragment - it already exists in most of the ready-made HardFault handlers, besides those written under pure bare metal - determines which stack it is: when working in the OS, it can be either MSP or PSP, and they have different addresses. In bare metal projects, the MSP (Main Stack Pointer) stack is usually set a priori, without verification — for the PSP (Process Stack Pointer) cannot be there due to the lack of processes.

Having determined the required stack and putting its address in R0, we read R1 values (offset 0x04) and R2 (offset 0x08) from it, compare it with magic words, if both match, read PC value (offset 0x18) from the stack, add 2 to it (2 bytes - the size of the instructions on the Cortex-M *) and save back to the stack. If this is not done, when returning from the handler, we will find ourselves on the same instruction that actually caused the exception, and we will always run in a circle. Adding 2 moves us to the next instruction at the time of return.

* Upd. In the comments there was a question about the size of the instructions on the Cortex-M, I’ll bring out the correct answer here: in this case crash is caused by the LDRB instruction, which is present in the ARMv7-M architecture in two versions - 16-bit and 32-bit. The second option will be selected if at least one of the conditions is fulfilled:

The author clearly indicated the LDRB.W instruction instead of the LDRB (we do not)
registers above R7 are used (we have R0 and R3)
specified offset greater than 31 bytes (we have no offset)

In all other cases (i.e., when the operands match the format of the 16-bit version of the instruction), the assembler must choose the 16-bit version.

Therefore, in our case there will always be a 2-byte instruction that needs to be stepped over, but if you edit the code a lot, options are possible.

Then write 0 to R5, which serves as an indicator of getting into HardFault. Registers after R3 before special registers are not saved in the stack and are not restored when they exit the handler, so it is on our conscience to spoil them or not to spoil them. In this case, R5 from 1 to 0, we change purposefully.

Returning from an interrupt handler is done strictly in one way. When entering the handler, a special value is written to the LR register called EXC_RETURN, which to exit the handler must be written to the PC - and not just write, but do it with a POP or BX command (that is, “mov pc, lr”, for example, does not work , although for the first time you might think that it works). BX LR looks like an attempt to go to a meaningless address (in LR there will be something like 0xFFFFFFF1, which has nothing to do with the real address of the procedure we need to return to), but in reality the processor, seeing this value in the PC (where it will go automatically), he will restore the registers from the stack and continue to perform our procedure - with the following procedure after the HardFault, due to the fact that we increased the PC in this stack by 2.

Read about all the offsets and commands can be understood where , of course.

Well, or if the magic numbers are not visible, then everything will go to regular_handler, followed by the usual HardFault processing procedure - as a rule, this is a function that prints register values to the console, decides what to do next with the processor, etc.

Determine RAM Size

Using all of this is simple and straightforward. We want to write a firmware that runs on several microcontrollers with a different amount of RAM, while each time using RAM in full?

Yes Easy:

 static uint32_t cpu_find_memory_size(char *base, uint32_t block, uint32_t maxsize) { char *address = base; do { address += block; if (!cpu_check_address(address)) { break; } } while ((uint32_t)(address - base) < maxsize); return (uint32_t)(address - base); } uint32_t get_cpu_ram_size(void) { return cpu_find_memory_size((char *)SRAM_BASE, 4096, 80*1024); }

maxsize is needed here, so that at the maximum possible amount of RAM between it and the next block of addresses, there may be no gap at which cpu_check_address will break. In this example, it is 80 KB. It also makes no sense to probe all addresses - just look at the minimum possible step between the two models of the controller and put it as a block.

The program transition to the bootloader, located unknown where

Sometimes you can do more intricate stunts - for example, imagine that you want to programmatically jump onto a regular factory STM32 bootloader to switch to firmware update mode via UART or USB, without bothering to write your bootloader.

The STM32 bootloader lies in the area called System Memory, which you need to switch to, but there is one problem - this area has different addresses, not just on different processor series, but on different models of the same series (you can read the epic tablet in AN2606). pages 22 to 26). When introducing the corresponding functionality into the platform in general, and not just into a specific product, I want versatility.

In the CMSIS files, the start address of the System Memory is also missing. Determine it by the Bootloader ID is not possible, because this is a chicken and egg problem - the bootloader ID lies in the last System Memory byte, which brings us back to the question of address.

However, if we look at the STM32 memory card, we will see something like this:

In this case, we are interested in the System Memory environment - for example, there is a one-time programmable area (not in all STM32) and Option bytes (in all) on top. This structure is observed not only in different models, but in different STM32 lines, with the difference only in the presence of OTP and the presence of a gap in the addresses between the system memory and options.

But for us in this case, the most important thing is that the address of the beginning of the Option Bytes is in the regular CMSIS headers - it is called OB_BASE there.

Further simple. We write the search function for the first valid or invalid address up or down from the specified one:

 char *cpu_find_next_valid_address(char *start, char *stop, bool valid) { char *address = start; while (true) { if (address == stop) { return NULL; } if (cpu_check_address(address) == valid) { return address; } if (stop > start) { address++; } else { address--; } }; return NULL; }

And we are looking down from Option bytes, first the end of either the system memory, or the OTP adjacent to it, and then the beginning of the system memory - in two passes:

 /* System memory is the valid area next _below_ Option bytes */ char *a, *b, *c; a = (char *)(OB_BASE - 1); b = 0; /* Here we have System memory top address */ c = cpu_find_next_valid_address(a, b, true); /* Here we have System memory bottom address */ c = cpu_find_next_valid_address(c, b, false) + 1;

And without much difficulty, we arrange this into a function that finds the beginning of the system memory and jumps on it, that is, it starts the bootloader:

 static void jump_to_bootloader(void) __attribute__ ((noreturn)); /* Sets up and jumps to the bootloader */ static void jump_to_bootloader(void) { /* System memory is the valid area next _below_ Option bytes */ char *a, *b, *c; a = (char *)(OB_BASE - 1); b = 0; /* Here we have System memory top address */ c = cpu_find_next_valid_address(a, b, true); /* Here we have System memory bottom address */ c = cpu_find_next_valid_address(c, b, false) + 1; if (!c) { NVIC_SystemReset(); } uint32_t boot_addr = (uint32_t)c; uint32_t boot_stack_ptr = *(uint32_t*)(boot_addr); uint32_t dfu_reset_addr = *(uint32_t*)(boot_addr+4); void (*dfu_bootloader)(void) = (void (*))(dfu_reset_addr); /* Reset the stack pointer */ __set_MSP(boot_stack_ptr); dfu_bootloader(); while (1); }

It depends on the specific processor model ... yes, nothing depends. The logic will not work on models that have a hole between OTP and system memory - but I didn’t check if there are any. Will actively work with OTP - check.

Other tricks relate only to the usual procedure for calling a bootloader from your code - do not forget to reset the stack pointer and call the exit procedure in the bootloader before initializing the processor peripherals, clock frequencies, etc.: because of its minimalism, the bootloader can score on initialize the periphery and expect it to be in the default state. A good way to call a bootloader from an arbitrary location in your program is to write to the RTC Backup Register or simply to a known address in the memory of the magic number, program reboot and check the initial stages of initialization of this number.

PS Since all the addresses in the processor's memory card are aligned in the worst case to 4, the procedure described above will speed up the idea of stepping over them in steps of 4 bytes instead of one.

Important note

NB: note that on a specific controller the validity of a specific address does not necessarily indicate the actual presence of a functional that can be located at this address. For example, the address of the register controlling some optional peripheral unit may be valid, although the unit itself is absent in this model. From the manufacturer’s side, the most interesting dirty tricks are possible, usually rooted in the use of the same crystals for different processor models. However, in most cases, these procedures work and are very useful.

Source: https://habr.com/ru/post/437256/