How not to litter in java

There is a popular misconception that if you don’t like the garbage collection, you should write not in Java, but in C / C ++. For the last three years, I have been writing low-latency Java code for currency trading, and I had to avoid creating unnecessary objects in every possible way. As a result, I formulated for myself a few simple rules on how to reduce allocations in Java, if not to zero, then to a certain reasonable minimum, without resorting to manual memory management. Perhaps someone from the community will also benefit.

Why avoid creating garbage at all

About what are the GC and how to customize them said and written a lot. But ultimately, no matter how you customize the GC - the code that litters will work suboptimally. There is always a trade-off between throughput and latency. It becomes impossible to improve one without deteriorating the other. As a rule, GC overheads are measured by studying the logs - it is possible to understand by what moments there were pauses and how long they took. However, the GC logs contain far from all the information about these overheads. The object created by the thread is automatically placed in the L1 cache of the processor core on which the thread is running. This leads to crowding out of other potentially useful data. With a large number of allocations, the payload can be pushed out of the L3 cache. The next time the thread accesses this data, the miss cache will occur, causing delays in the execution of the program. Moreover, since the L3 cache is common to all cores within the same processor, the trash will push data and other threads / applications out of the L3 cache, and they will already be confronted with extra expensive cache missas, even if they themselves are written in C and do not create garbage. No settings, no garbage collectors (neither C4 nor ZGC) will help to cope with this problem. The only way to improve the situation in general is not to create unnecessary objects unnecessarily. Java, unlike C ++, does not have a rich arsenal of mechanisms for working with memory, but nevertheless there are a number of ways to minimize allocations. About them will be discussed.

Lyrical digression

Of course, you do not need to write all the code in the style of garbage free. The trick of the Java language is that you can greatly simplify your life by removing only the main sources of garbage. You can also not engage in safe memory reclamation when writing lock-free algorithms. If a certain code is executed only once at the start of the application, then it can allocate as much as necessary, and that is not a problem. Well, of course, the main working tool when getting rid of excess garbage is the allocation profiler.

Using primitive types

The simplest thing to do in many cases is to use primitive types instead of object ones. The JVM has a number of optimizations that minimize the overhead of object types, such as caching small values of integer types and inlining simple classes. But these optimizations are not always worth relying on, because they may not work: the integer value may not be cached, and inlineing may not occur. Moreover, when working with a conditional Integer, we are forced to follow the link, which potentially leads to a cache mission. Also, all objects have headers that take up extra space in the cache, displacing other data from there. Let's assume that a primitive int is 4 bytes. Object Integer occupies 16 bytes + the size of the reference to this Integer is 4 bytes minimum (in the case of compressed oops). In sum, the Integer takes up five (!) Times more space than the int . Therefore, it is better to personally use primitive types. I will give a few examples.

Example 1. Conventional calculations

Suppose we have a regular function that just counts something.

 Integer getValue(Integer a, Integer b, Integer c) { return (a + b) / c; }

Such code is most likely to be inlined (both the method and the classes) and will not lead to redundant allocations, but you cannot be sure of this. Even if this happens, there will be a problem with the fact that a NullPointerException can fly from here. One way or another, the JVM will either have to insert null checks under the hood, or somehow understand from the context that null cannot come as an argument. Anyway, it is better to just write the same code in primitives.

 int getValue(int a, int b, int c) { return (a + b) / c; }

Example 2. Lambda

Sometimes objects are created without our knowledge. For example, if we transfer primitive types to where object is expected. This often happens when using lambda expressions.
Imagine that we have the following code:

 void calculate(Consumer<Integer> calculator) { int x = System.currentTimeMillis(); calculator.accept(x); }

Despite the fact that the variable x is a primitive, an object of type Integer will be created that will be passed to the calculator. To avoid this, use IntConsumer instead of Consumer<Integer> :

 void calculate(IntConsumer calculator) { int x = System.currentTimeMillis(); calculator.accept(x); }

This code will not lead to the creation of an extra object. In java.util.function there is a whole set of standard interfaces adapted for using primitive types: DoubleSupplier , LongFunction , etc. Well, if something is missing, you can always add the necessary interface with primitives. For example, instead of BiConsumer<Integer, Double> you can use a homemade interface.

 interface IntDoubleConsumer { void accept(int x, double y); }

Example 3. Collections

Using a primitive type can be difficult because a variable of this type lies in some collection. Suppose that we have a certain List<Integer> and we want to find out what numbers are in it and count how many times each number is repeated. For this we use HashMap<Integer, Integer> . The code looks like this:

 List<Integer> numbers = new ArrayList<>(); // fill numbers somehow Map<Integer, Integer> counters = new HashMap<>(); for (Integer x : numbers) { counters.compute(x, (k, v) -> v == null ? 1 : v + 1); }

This code is bad in several ways. First, it uses an intermediate data structure, without which one could probably do without. Well, okay, for simplicity, we assume that this list will be needed later, i.e. it can not be completely removed. Secondly, in both places the object Integer instead of the primitive int . Thirdly, a lot of allocations occur in the compute method. Fourth, the iterator is allocated. This allocation is most likely to be inline, but nonetheless. How to turn this code into a garbage free code? You just need to use the collection on primitives from some third-party library. There are a number of libraries containing such collections. The following piece of code uses the agrona library.

 IntArrayList numbers = new IntArrayList(); // fill numbers somehow Int2IntCounterMap counters = new Int2IntCounterMap(0); for (int i = 0; i < numbers.size(); i++) { counters.incrementAndGet(numbers.getInt(i)); }

The objects that are created here are two collections and two int[] that are inside these collections. Both collections can be reused by calling their clear() method. Using collections on primitives, we did not complicate our code (and even simplified by removing the compute method with a complex lambda inside it) and received the following additional bonuses compared to using standard collections:

Almost complete lack of allocations. If the collection is reused, then there will be no allocation at all.
Substantial memory savings ( IntArrayList takes about five times less space than ArrayList<Integer> . As already mentioned, we care about the economical use of processor caches, and not RAM.
Sequential memory access. A lot has been written about why this is important, so I will not stop there. Here are a couple of articles: Martin Thompson and Ulrich Drepper .

Another small comment about the collections. It may be that the collection contains values of different types, and therefore it cannot be replaced with a collection with primitives. In my opinion, this is a sign of a poorly designed data structure or algorithm as a whole. Most likely in this case, the allocation of unnecessary objects is not the main problem.

Mutable objects

And what to do if primitives do not work? For example, if the method we need should return several values. The answer is simple - use mutable objects.

Small retreat

Some languages emphasize the use of immutable objects, such as Scala. The main argument in their favor is that writing multi-threaded code is greatly simplified. However, there are overhead costs associated with excessive waste allocation. If we want to avoid them, we should not create short-lived immutable objects.

How does it look in practice? Suppose we need to calculate the quotient and the remainder of the division. And for this we use the following code.

 class IntPair { int x; int y; } IntPair divide(int value, int divisor) { IntPair result = new IntPair(); result.x = value / divisor; result.y = value % divisor; return result; }

How can I get rid of allocation in this case? That's right, pass IntPair as an argument and write the result there. In this case, you need to write a detailed javadoc, and even better to use some kind of convention for the names of variables where the result is written. For example, you can start with the prefix out. Garbage free code in this case will look like this:

 void divide(int value, int divisor, IntPair outResult) { outResult.x = value / divisor; outResult.y = value % divisor; }

I want to note that the divide method should not save the link to the pair anywhere or pass it to the methods that can do this, otherwise we may have big problems. As we see, mutable objects are harder to use than primitive types, so if there is an opportunity to use primitives, then it is better to do so. In fact, in our example, we transferred the problem with allocation from the inside of the divide method to the outside. In all places where we call this method, we will need to have some kind of IntPair dummy, which we will pass to the divide . It is often enough to store this dummy in the final object field, from where we call the divide method. Let me give you a far-fetched example: suppose that our program does only that which receives a stream of numbers over the network, divides them and sends the result to the same socket.

 class SocketListener { private final IntPair pair = new IntPair(); private final BufferedReader in; private final PrintWriter out; SocketListener(final Socket socket) throws IOException { in = new BufferedReader(new InputStreamReader(socket.getInputStream())); out = new PrintWriter(socket.getOutputStream(), true); } void listenSocket() throws IOException { while (true) { int value = in.read(); int divisor = in.read(); divide(value, divisor, pair); out.print(pair.x); out.print(pair.y); } } }

For brevity, I did not write “extra” code for handling errors, correctly terminating the program, etc. The main idea of this piece of code is that the IntPair object we IntPair is created once and stored in the final field.

Object Pools

When we use mutable objects, we must first get an empty object from somewhere, then write the data we need into it, use it somewhere, and then return the object “in place”. In the above example, the object was always “in place”, i.e. in final field. Unfortunately, it is not always possible to do it in a simple way. For example, we may not know in advance exactly how many objects we need. In this case, object pools come to our rescue. When we need an empty object, we get it from the object pool, and when it ceases to be needed, we return it there. If there is no free object in the pool, the pool creates a new object. This is in fact a manual memory management with all the ensuing consequences. It is desirable not to resort to this method if it is possible to use the previous methods. What can go wrong?

We can forget to return the object to the pool, and then create garbage ("memory leak"). This is a small problem - the performance will slightly decrease, but the GC will work and the program will continue to work.
We can return the object to the pool, but save the link to it somewhere. Then someone else will get an object from the pool, and at this point in our program there will already be two links to the same object. This is a classic use-after-free problem. It is difficult to debug, because unlike C ++, the program will not fall with a segfolt, but will continue to work incorrectly .

In order to reduce the likelihood of making the errors described above, you can use the standard try-with-resources construction. It may look like this:

 public interface Storage<T> { T get(); void dispose(T object); } class IntPair implements AutoCloseable { private static final Storage<IntPair> STORAGE = new StorageImpl(IntPair::new); int x; int y; private IntPair() {} public static IntPair create() { return STORAGE.get(); } @Override public void close() { STORAGE.dispose(this); } }

The divide method might look like this:

 IntPair divide(int value, int divisor) { IntPair result = IntPair.create(); result.x = value / divisor; result.y = value % divisor; return result; }

And the listenSocket method listenSocket like this:

 void listenSocket() throws IOException { while (true) { int value = in.read(); int divisor = in.read(); try (IntPair pair = divide(value, divisor)) { out.print(pair.x); out.print(pair.y); } } }

In the IDE, as a rule, you can customize the highlighting of all use cases of AutoCloseable objects outside the try-with-resources block. But this is not an absolute option, because Highlighting in the IDE can simply be turned off. Therefore, there is another way to ensure that the object is returned to the pool - inversion of control. I will give an example:

 class IntPair implements AutoCloseable { private static final Storage<IntPair> STORAGE = new StorageImpl(IntPair::new); int x; int y; private IntPair() {} private static void apply(Consumer<IntPair> consumer) { try(IntPair pair = STORAGE.get()) { consumer.accept(pair); } } @Override public void close() { STORAGE.dispose(this); } }

In this case, in principle, we cannot access the object of the IntPair class IntPair outside. Unfortunately, this method does not always work either. For example, it will not work if one thread pulls objects from the pool and puts them in a queue, while another thread pulls them out of the queue and returns to the pool.

Obviously, if we store not self-written objects in the pool, but some library objects that do not implement AutoCloseable , then the try-with-resources option will not work either.

An additional problem here is multithreading. The implementation of the object pool must be very fast, which is quite difficult to achieve. A slow pool can do more harm for performance than good. In turn, the allocation of new objects in TLAB occurs very quickly, much faster than malloc in C. Writing a fast object pool is a separate topic that I would not like to develop now. I can only say that I did not see any good "ready-made" implementations.

Instead of conclusion

In short, reusing objects with object pools is a serious hemorrhoid. Fortunately, almost always you can do without it. My personal experience suggests that excessive use of object pools signals problems with the application architecture. As a rule, we only need one instance of the object cached in the final field. But even this is overkill if it is possible to use primitive types.

Update:

Yes, I remembered another way for those who are not afraid of bitwise shifts: packing several small primitive types into one big one. Suppose we need to return two int 's. In this particular case, you can not use the IntPair object, and return one long , the first 4 bytes in which will correspond to the first int 'y, and the second 4 bytes - to the second. The code might look like this:

 long combine(int left, int right) { return ((long)left << Integer.SIZE) | (long)right & 0xFFFFFFFFL; } int getLeft(long value) { return (int)(value >>> Integer.SIZE); } int getRight(long value) { return (int)value; } long divide(int value, int divisor) { int x = value / divisor; int y = value % divisor; return combine(left, right); } void listenSocket() throws IOException { while (true) { int value = in.read(); int divisor = in.read(); long xy = divide(value, divisor); out.print(getLeft(xy)); out.print(getRight(xy)); } }

Of course, such methods need to be thoroughly tested, because it is quite easy to jiggle when writing them. But then just use it.

Source: https://habr.com/ru/post/436024/