Arithmetic compression

Question

If anyone is familiar with the method of arithmetic compression, tell me - how to get out of the situation. When decoding, only <= 13 characters are decoded normally. The rest are not those, because do not fall into their intervals. What to do?

there is a division of large fractional numbers, as a result of which they do not interpose in their intervals.

Accepted Answer · 2011-10-16T18:21:13

Without a code, it is certainly difficult to say, but the meaning is extremely simple.

There is some real data type encoding your very message. It is clear that the accuracy in this real type directly depends on the number of bits that are spent on the mantissa number. And there is some theoretical limit of accuracy, after which floating point operations with real numbers simply cease to be exact.

The idea of the correct arithmetic coding of text of arbitrary length is to “shove” the maximum possible number of characters into a real type and signal that no more characters fit into this number of bits without a possible loss of accuracy.

Obviously, there is some theoretical limit on the number of these symbols, which is based on the IEEE754 specification and the basic principles of error analysis. A way to calculate this limit is given in some articles, for example in wiki references.

Now a little closer to practice.

It is necessary in some way to determine the exact upper limit of the number of characters that can be encoded, for example, using 32-bit floating point type. You can do this by simply “figured out” this number or try to establish an exact boundary based on the theory of the method.

I cannot say for sure here (there is not enough knowledge and desire to read the relevant articles) whether this assessment will depend on the coding alphabet used or not, I suppose that you can find some invariant that is suitable for any coding table.

Well, then everything is simple - we know that, let's say in 32 bits, a maximum of N characters fit into us so that the decryption is carried out unambiguously. These same characters are encoded in 32 bits and we go to the next N characters.

The decoder, accordingly, needs to somehow communicate N and the size of a single word — 32 bits. It is also obvious that making these same words, for example, 64 bit ones, is rather pointless if the type of 32 bits is used for FPU operations (just wasting our time).

I recommend reading also a section of an article from Wikipedia explaining accuracy and adaptive arithmetic coding.

I need to write to the file the final limit and table of intervals.
And to do this for every 10 characters is meaningless, because

Arithmetic compression

1 answer 1

More articles: