I'll write a couple of clarifications.
As far as I understand, the question itself sounds a bit different: "if the cache consistency protocol (cache coherency) binds the processor caches to keep the memory cell in a consistent state, then why do we need volatile, which does the same thing?"
First, there is a discussion of two different levels. JLS operates inside the JVM, the cache coherence protocol is present only in a specific processor architecture. Cache coherence does not have to exist on the architecture for which the JVM is compiled and running , so JLS makes the optional feature mandatory (in fact, the feature is slightly more than just consistency, below). I’m pretty sure that 99% of multi-core processors now have this protocol, but Java cannot rely on something that is not guaranteed - it is assumed that all Java applications must run the same on all architectures (except when interacting with the OS, where there may be, for example, different ways). Therefore, JLS was almost obliged to introduce such a concept, even if it exists on most systems out of the box, because even if the JVM is implemented on some python, it still has to execute the code in the same way as on any other system.
Secondly, if we take the definition from Wikipedia:
a sequential order
(free translation) multiprocessor cache is consistent if all write operations to the same address are performed in a sequential order
here it is worth paying attention to the "memory location". In Java, there are data types that can occupy more than one word with which the processor operates - at least, when running on a 32bit operating system, double and long will take up two words each. If I understand everything correctly, then the following situation may arise on such a system:
линия кэша 1: <другие данные><старшие или младшие 32 бита double> линия кэша 2: <остаток double><другие данные>
In this case, the processor even under strict cache coherence has the right to update exactly half of the double, as a result of which the threads have the right to see garbage instead of real value. Volatile prohibits such a situation, ensuring the atomicity of the recording of any variable.
Thirdly, besides the directly "iron" problems, the compiler participates in the code execution (indirectly). I do not know how applicable this is to modern Java, but the aggressive compiler has the right to apply the following optimizations:
boolean flag = true; while (flag) { doProcessing(); } // хм, flag не отмечен volatile, значит, программист считает, что он может обновляться только локально // закэширую-ка я его в регистре процессора, так будет быстрее eax = load(flag); while (eax) { doProcessing(); }
the register will never be updated - it has nothing to do with the cache integrity protocol. Again, I don’t know how the existing Java compilers actually behave, but this particular example is given in the JLS as unsafe.
And finally, the volatile semantics interfere with the program execution order. JLS requires the following conditions:
- All actions within the same thread have a happens-before relationship with each other - i.e. the result of the higher-level action code will always be visible to the lower-level action of the code.
- All actions with volatile have an happens-before dependency to each other - if someone writes some value in the volatile-field, all subsequent readings no longer have the right to see the outdated value
- The relation happens-before is transitive, i.e. if the operation A happens-before B, and B happens-before C, then A happens true-happens C - then C will see all changes made by A.
The compiler, the JVM and the processor have the right to move expressions as they please, as long as these conditions are met. If you take the following code
int result = 0; boolean done = false; .... this.result = 1; this.done = true;
then he has every right to turn into
this.done = true; this.result = 1;
because all subsequent expressions will still see the same result. In this example, another thread that saw done = true can still read 0 from result . However, if you declare done as volatile, then writing to result must occur before writing true to done , and reading true to write after it, and thus you can guarantee the visibility of changes in listener threads. This does not negate the possibility that as a result more than one record will occur during this period, it only guarantees that by the time of reading from result there will be a modern done update or a later value in it.
Update
In addition to all of the above, there is another funny case. Java frankly suffers from large hips, more precisely, from GC execution time on such a hip. Naturally, they are trying to deal with this problem with the help of GC, working in parallel with the application. One of the tactics in this case is the evacuation of living objects from the region being cleaned, in order to then simply declare it free for complete rewriting. In this case, two copies of the object (one at the old address, and another one being evacuated) that require synchronization of the records and reading can live at the same time in the JVM. Fortunately for the implementers, the JMM does not promise anything for regular readings, so most operations can be freed from synchronization, and at one point there can be a situation that all records go to one object, and reading is done from another until the access is not synchronized. This, like all the examples described above, is in full agreement with cache coherence, but allows for anomalies when the application is running (and all for the same reasons - cache coherence works at the level of individual memory blocks, JVM objects and fields). This paragraph refers to Shenandoah GC, which is expected in the tenth java, but such ways to shoot a leg can be easily expected in other situations.