I had a problem: in one part of the program I use the combining from the char * pointer and just the char. I would like to derive both of these elements (interpreting them differently, of course). But, if a symbol is stored in the union, the pointer cannot be dereferenced - otherwise we will fall. How to find out in advance whether dereferencing will lead to a segfolt?

Of course, it would be possible to start a special field alongside in which to store information about what lies in the union. But for this you need to rewrite most of the code, which is hardly suitable, and, for sure, there is a less global solution.

  • one
    you can try to use a simple trick - since the symbol will most likely have a code up to 255, and the pointer within the first megabyte is probably invalid, the usual if will help. But this is a crutch. - KoVadim
  • 3
    @KoVadim, character code must be provided. If there was a pointer, and we are overwritten with one character from 4 or 8 by a character, then the number is still greater than 256, and, most likely, is still a valid pointer. - Qwertiy

7 answers 7

It seems to me that there is absolutely no valid, in 100% of cases, a working solution, except for the introduction of some additional flag, which you have already mentioned.

  • 2
    Frankly, in my opinion, it is easier to use an additional flag (even with rewriting a large part of the code :)) than such a code if necessary dereferencing. Moreover, if I am not mistaken, any dereference of anything except valid objects of the program itself is UB. - Harry
  • @Harry, UB - we do not need to dereference the pointer in order to understand that it is invalid. And there is no dereference - no, and UB. - Qwertiy
  • @Qwertiy And how to understand "I would like to bring both of these elements (interpreting them differently, of course)" out of the condition? Personally, I understand that outputting the value and value to an address, if possible - that is, dereference - Harry
  • @Harry, I understand this: we got a union, somehow magically found out that there is a symbol or a pointer in it, and brought it out accordingly. The question is, what was this magical way that we learned the type :) - Qwertiy
  • one
    @Qwertiy And we did not recognize the type . We learned at best that it can be interpreted as a pointer. No more. - Harry

You can check if the address is read-write, but it is still difficult to draw conclusions that your union is still stored.

Just in case, the code that I used in Linux (and oddly enough, checked in MinGW Windows-XP)

/* misc debug */ #include <signal.h> #include <setjmp.h> #include <fcntl.h> #ifndef unix #define sigjmp_buf jmp_buf #define siglongjmp longjmp #define sigsetjmp(buf,flag) setjmp(buf) #endif #ifdef __cplusplus static sigjmp_buf _av_is_accessible_jmp; static void _av_is_accessible_hdr (int sig) { siglongjmp(_av_is_accessible_jmp, 0); } #endif // check is addr readable/writable, returns 1 if so static int _av_is_mem_accessible (volatile char *addr, int writable) { if (addr == 0 || addr == (void *)-1LL) return 0; int rc = 1; #ifndef __cplusplus sigjmp_buf _av_is_accessible_jmp; void _av_is_accessible_hdr (int sig) { siglongjmp(_av_is_accessible_jmp, 0); } #endif void #ifdef unix (* sigbus)(int) = signal(SIGBUS, _av_is_accessible_hdr), #endif (* sigsegv)(int) = signal(SIGSEGV, _av_is_accessible_hdr); #ifdef DEBUG #ifdef unix if (sigbus == SIG_ERR) { fputs ("SIG_ERR signal SIGBUS\n", stderr); exit(1); } #endif if (sigsegv == SIG_ERR) { fputs ("SIG_ERR signal SIGSEGV\n", stderr); exit(1); } #endif if (sigsetjmp(_av_is_accessible_jmp, 1)) { // MANDATORY save sigmask rc = 0; errno = EINVAL; } else { char t = *addr; if (writable) *addr = t; } #ifdef unix signal(SIGBUS, sigbus); #endif signal(SIGSEGV, sigsegv); return rc; } 

The idea here is that we are trying to read a byte at a given address and, if it was possible, then write it back. In case of failure, we intercept the signal and return the corresponding result.

PS In gcc, the code is reentrant and in theory thread-safe, but for crosses it was not possible to do this ...

  • And unless the optimizer cannot throw out assignment *addr = t , and then there will be no check on availability of record? And it also seems to me that there is a problem of a different kind: if the memory page that is being checked is private, but shared by other processes using the "copy by write" mechanism, unnecessary copying of the whole page will occur, which leads to a drop in performance and non-optimal expenditure of physical memory . - mymedia
  • @mymedia, O !!! Thank you, you found a mistake (with optimization). Already corrected. It is enough to add volatile and everything is OK (at least with gcc / g ++). As for copy-on-write, well, nevertheless (imho) is most often checked for writing with the goal of writing there further, and not out of empty curiosity, so the page will still be duplicated. And just for analysis, you can call a function from your answer. (Actually, in my case, this code was used to check whether a constant string was passed or it can be changed) - avp
  • why is it impossible in ++? there is even a global segolt interceptor. Or extensions over try catch. - pavel
  • @pavel, that's it, global (or static in a file, like this). Problem with threads in static for C ++ static sigjmp_buf _av_is_accessible_jmp; field static sigjmp_buf _av_is_accessible_jmp; . / A, for the practical application of this function translated from C ++ (it does not call any other user functions that can use indirect recursion), this will do (in a single-threaded program). Speaking of the fact that for C ++ this approach does not work, I still had in mind a certain general case of using such an approach. In C (more precisely, GNU C) works, because here we can use nested functions. - avp

There is one way, however, it is intolerable and will only work on Linux. Need to look at the table of memory allocated to the process. If the memory to which the pointer points is marked as readable, it means that the pointer can be dereferenced and output the contents of that memory. If you need to write something in this memory, you should look at the second flag.

Of course, this option is more suitable only for debugging purposes. But, perhaps, on other systems, you can somehow find out the correctness of the pointer.

The idea of ​​the code was taken from the source of the pmap utility. More information about the format of the file /proc/self/maps can be found in the reference manual proc (5) .

 bool can_i_read(void* p) { uintptr_t begin, end; char readable, writable, executable, mapped; FILE* fp = fopen("/proc/self/maps", "r"); if (!fp) { return false; } while (fscanf(fp, "%" SCNxPTR "-%" SCNxPTR " %c%c%c%c", &begin, &end, &readable, &writable, &executable, &mapped) == 6) { if (begin <= (uintptr_t)p && (uintptr_t)p < end) { fclose(fp); // если нужно проверить доступность на запись, // следует смотреть флаг writable return readable == 'r'; } // не зациклимся — перед концом всегда будет перевод строки while (fgetc(fp) != '\n') ; } fclose(fp); return false; } 

Performance tested on Ubuntu 14.04.4 / Linux 4.1.15

  • Here only a question arises about the time of such a check. It is clear that if the verification is single, then this can be neglected. And if we are talking about thousands of points? - avp
  • @avp, you can open the file once when you start the program or when you first check. And when you need to check, perform seek to the beginning (with even more desire for optimization, you can even use mmap and binary search). After all, this file provides direct access to kernel structures, so this should work quickly. - mymedia
  • Generally, IMHO, good. for the sake of this, they did not start an extra system call, but used the already existing file interface. - mymedia
  • one
    mmap on / proc / ... ??? Think this is possible? - avp pm
  • @avp, yes, in fact, mmap does not work :( - mymedia

At least in Windows, the first 64 kilobytes of address space are reserved specifically for processing pointer null dereferencing. And char is one byte, so the value in it will be less than 256. In this case, when writing, you should ensure that the remaining bytes are cleared and check if it does not exceed 255.

Casting a pointer to an int will be incorrect in 64-bit programs, so you need to uintptr_t 256 to void * , or use something like uintptr_t .

    If the program will work only on microcontrollers, then any pointer can be renamed, any data will simply be returned. If the program works in the OS without using dirty tricks, it is impossible to determine the valid memory address or not valid inside the program. Only if it is NULL.

    • In fact, in the case of x86 or amd64 architecture, this is definitely possible. You only need to ask "for help" to the operating system. The OS knows which virtual addresses of the process are displayed somewhere. And this information can be found out. See an example in my answer. - mymedia

    Is it possible to check this:

     #include <iostream> using namespace std; union chars { char c; char* p; }; void print(chars obj) { (uintptr_t)tmp = (uintptr_t)obj.p; if ((uintptr_t)obj.p > 255) { cout << obj.p[0] << endl; } else { cout << obj.c << endl; } } int main() { chars c_1; chars c_2; c_1.c = 'a'; c_2.p = "bcde"; print(c_1); print(c_2); system("PAUSE"); } 

    Cast to type int, and if the value is 0-255, then in the union is a symbol, if the pointer is larger. (Pointers with three-digit EMNIP values ​​are system values, and dynamically allocating memory, as for strings, you will not assign them).

    PS The decision is a "crutch", for details, see the comments.

    • 0. Speech only about Windows. 1. Negative pointers are not always systemic - google about the 3GB flag. 2. Under the x64 pointer, 4 bytes were cut - nothing can be guaranteed at all. - Qwertiy
    • Unfortunately, your code does not even compile: you are trying to cast a pointer (which on my machine has a size of 8 bytes) to a whole (the size of which is 4 bytes). You should use the special type uintptr_t , the size of which is guaranteed to be sufficient for storing pointer values. You can read more about it at cplusplus.com . You also use the system function from the standard <cstdlib> header without turning it on. - mymedia
    • But even correcting these errors, the code will still not work correctly (it does not work for me). char can be either sign or bezznokovym, and moreover, is not required to occupy 8 bits. But that's not all: usually the size of a char is still smaller than the size of a pointer, then when you assign a value of type char to a field c , it will fill only part of the union, and the remaining bits will contain the old bits of the pointer. - mymedia
    • @Qwertiy, - 100% working means, as indicated above, skegg does not exist. There are already specific answers here, so this “crutch” also has the right to life. I will take into account the comments and correct the answer. - Mirdin
    • @mymedia - the very problem of “asking for a crutch” itself, since, in principle, you cannot have a general solution, if you don’t know the type of variable in a strongly typed language, then you cannot work with it. - Mirdin

    Use the new C11 fitch, _Generic ():

     #define ischarprt(x) _Generic((x), \ char: 0, \ char *: 1, \ default: -1) 

    Then:

     char ch = 'c'; char *prt = "pointer"; int i = 0; if(ischarprt(prt) == 1){ //Если данные - указатель на char } if(ischarprt(ch) == 0){ //Если данные - char } 

    Instead of numbers can be strings and symbols.

    UPD: here is the answer to your question - here

    • Wow! Yes, it can be used as an overload - mymedia
    • @mymedia of course :) This solution is right for you? - The_Netos
    • @mymedia, but in general, if you are trying to write a dynamic data container, nothing will help you except the current type flag in the container. But there is one trick - type flags can be set via enum, and uniquely identified through _Generic (). Good luck 😊 - The_Netos
    • @mymedia updated the answer, now there is a link to the solution to your question. - The_Netos
    • one
      @The_Netos, no, for _Generic to work, the type needs to be known at compile time. - dzhioev