Why BitSet is implemented on the long array

Question

Nowhere, for some reason, is the Internet not explaining the choice of long as the BitSet manager. For example, why you can't use short instead. Suppose we fill the structure with N (N <64) elements, then it turns out that in comparison with short, the overhead of memory will be more.

Mike mike 38.7k 1 golden mark 25 silver marks 62 bronze marks · Accepted Answer · 2015-11-26T09:00:47

I can not imagine what a Bitset is and is not familiar with java. But if you do some kind of base class for working with bits and do not do it with a variable size of storage elements, I would stop at choosing the maximum size of the type of processor that fits into the register for this architecture. From the point of view of the processor, the operation with short and long takes the same execution time. In addition, the operation of data in RAM that is not aligned to the width of the data bus usually takes longer. In this regard, the data are trying to align the width of the tire. Attempts to level short will result in unused space, which will reduce memory savings to 0

The modern intel architecture uses 64-bit registers and requires the same alignment of data in memory for fast operation.

@kff, but should this be compiled when compiling to native code?

Answer 2 · 2015-11-26T19:34:08

In terms of saving memory, it will be so-so. much more will be spent on the object itself.

If you take JOL and see how much memory an object eats, then on 32bit jvm (OpenJDK 7u91 on i686 ubuntu 14.04 in the virtualbox) the picture will be like this:

Running 32-bit HotSpot VM. Objects are 8 bytes aligned. Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] java.util.BitSet object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 8 (object header) N/A 8 4 int BitSet.wordsInUse N/A 12 1 boolean BitSet.sizeIsSticky N/A 13 3 (alignment/padding gap) N/A 16 4 long[] BitSet.words N/A 20 4 (loss due to the next object alignment) Instance size: 24 bytes (estimated, the sample instance is not available) Space losses: 3 bytes internal + 4 bytes external = 7 bytes total

8 bytes went to the object's header, another 8 went to the class fields, and 4 bytes to the pointer to the long array. Because objects are aligned to 8 bytes, then another 4 bytes is lost on it.

 [S object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 8 (object header) N/A 8 4 int [S.length N/A 12 0 short [S.<elements> N/A 12 4 (loss due to the next object alignment) Instance size: 16 bytes (estimated, the sample instance is not available) Space losses: 0 bytes internal + 4 bytes external = 4 bytes total [S@8b6c39d object externals: ADDRESS SIZE TYPE PATH VALUE 6f5b9950 16 [S [5, 7]

The array is also an object, and occupies 16 bytes even without data (8 is the header, 4 is the length field, 4 is empty). An array of short[2] stores data in the last 4 bytes. An array of short[3] already occupies 24 bytes.

Those. for N <= 32 (4 bytes), we could save 8 bytes by spending 40 bytes on objects.

On 64bit (jdk 1.8.0_45 on Windows 7 64) even this saving fails:

 Running 64-bit HotSpot VM. Using compressed oop with 0-bit shift. Using compressed klass with 0-bit shift. Objects are 8 bytes aligned. Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] [S object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 12 (object header) N/A 12 4 int [S.length N/A 16 0 short [S.<elements> N/A Instance size: 16 bytes (estimated, the sample instance is not available) Space losses: 0 bytes internal + 0 bytes external = 0 bytes total [S@238e0d81d object externals: ADDRESS SIZE TYPE PATH VALUE d5f14ce8 24 [S [5, 7]

because the object header occupies 12 bytes, and under the non-zero-length array, in any case, another 8 bytes will be allocated.

References:

Question on SO about performance long[] versus int[] .
Question on SO about memory overruns.

Why BitSet is implemented on the long array

2 answers 2

More articles: