In the third chapter of the above book, the implementation of the table of atoms is given. In particular, it shows the implementation of the function responsible for adding a new row to the table.

static struct atom { struct atom *link; int len; char* str; } *buckets[2048]; const char* Atom_new(const char* str, int len) { unsigned long h; int i; struct atom *p; assert(str); assert(len >= 0); for (h = 0, i = 0; i < len; i++) h = (h<<1) + scatter[(unsigned char)str[i]]; h %= NELEMS(buckets); for (p = buckets[h]; p; p = p->link) if (len == p->len) { for (i = 0; i < len && p->str[i] == str[i];) i++; if (i == len) return p->str; } p = (struct atom *)malloc(sizeof (struct atom) + len + 1); p->len = len; p->str = (char*)(p + 1); if (len > 0) memcpy(p->str, str, len); p->str = '\0'; p->link = buckets[h]; buckets[h] = p; } 

scatter is an array of unsigned long scatter[] containing 256 elements In the original, instead of p = (struct atom *)malloc(sizeof (struct atom) + len + 1); p = ALLOC(sizeof(*p) + len + 1) and an explanation is given that

The structure of the structure of the structure of the space structure and the structure of the structure of the structure of the structure.

so the replacement, if I understand correctly, is correct.

Specifically, I can not understand this line

 p->str = (char*)(p + 1); 

What does this shift by one mean? With such a shift, wouldn't the pointer point to memory outside the structure? Isn't it possible to do without this line, since we have allocated memory for the whole structure (including the string) and with the help of memcpy we just copy the string into p-> str?

  • p->str = '\0' ??? The code is written nonsense. Or you incorrectly reprinted. Either this book is a place in the furnace. - AnT
  • Yes, I was sealed, but the question was not this, especially since I already received an answer to it - Alexey

1 answer 1

What does this shift by one mean? With such a shift, wouldn't the pointer point to memory outside the structure?

Yes, this is how it was conceived, after the shift, the pointer will point outside the structure's border, and that is where the string will be stored, as described above. Note that the memory is allocated just by the structure itself, by the length of the string and by zero byte. And str will point to this memory. This gives a certain flexibility, albeit a very dubious one.

that is, on a 32-bit little-endian machine, the memory allocated for structures with strings 'qwerty' and ' asd ' after initialization may look something like this:

  +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |f0|ff|00|00|00|00|00|06|xx|xx|xx|xx| q| w| e| r| t| y|00| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ vvv ^ | | | | +--link len str---------+ | | | | | struct atomic | | |___________________________________| | +->+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |00|00|00|00|03|00|00|00|fc|ff|00|00| a| s| d|00| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ vvv ^ | | | | link len str---------+ | | | struct atomic | |___________________________________| ^ ^ | | p (00 00 ff f0) (p+1) (00 00 ff fc) 

Isn't it possible to do without this line, since we have allocated memory for the whole structure (including the string) and with the help of memcpy we just copy the string into p-> str?

It is possible, if the string is always stored together with the heading of the list, then usually it is done a little differently (syntax C99):

 struct atom { struct atom *link; size_t len; char str[]; }; 

At the end of the structure, an array is declared without specifying the length; as a result, accessing it will immediately be treated as memory accesses for the main structure. The rest of the code is completely identical except for this line p->str = (char*)(p + 1); . Before C99, an array of variable or unit length was usually declared (which with modern compilers can lead to warnings).

PS: Probably further error, instead of p->str = '\0'; should be p->str[len] = '\0';

  • Wait a minute But if the memory is allocated at once to the whole structure along with the string, then doesn’t this mean that this shifted pointer will also point outside the row boundary? Or does +1 mean that the pointer is shifted by one byte? But then this is also nonsense since int takes 4 bytes and link takes 4 bytes. - Alexey
  • @Alexey, painted in detail, maybe it will be so clear ... (p+1) moves the pointer to sizeof(*p) == sizeof(struct atomic) byte. - Fat-Zer
  • Understood thanks. - Alexey