📜 ⬆️ ⬇️

Checking a CDK project with the IntelliJ IDEA static analyzer

I decided to test the IntelliJ IDEA static Java code analyzer and with its help checked the project The Chemistry Development Kit . Here I will give some errors that I found. I think that some of them are typical for Java-programs as a whole, therefore they may be interesting.


The Chemistry Development Kit is an open source Java library for solving problems of chemoinformatics and bioinformatics. When I was engaged in bioinformatics, we actively used it. The project has been developed for more than 20 years, it has dozens of authors, and the quality of the code there is very uneven. However, the project has unit tests, and pom.xml contains integration with the JaCoCo coverage analyzer . In addition, there are set up plug-ins of as many as three static analyzers: FindBugs , PMD , Checkstyle . More interesting to check, what warnings remain.


A static Java code analyzer built into IntelliJ IDEA is not inferior to specialized static analysis tools, and in some ways surpasses them. In addition, virtually all static analysis features are available in Community Edition , the free and open source IDE. In particular, the free version gives all the warnings described in this article.


By default, static analysis is performed continuously in the code editing mode, so if you write code in IntelliJ IDEA, you will correct a lot of errors literally seconds after they were allowed, even before running the tests. You can check the entire project or its part in batch mode using the Analyze | Inspect Code or start a separate inspection using Analyze | Run Inspection by Name . In this case, some inspections become available, which, because of the complexity, do not work in edit mode. However, there are few such inspections.


Many IntelliJ IDEA inspections do not report a bug, but rather an inaccuracy in the code, or offer a more idiomatic, beautiful, or quick alternative. This is useful when you are constantly working in the IDE. However, in my case it is better to start with the messages that warn about real bugs. Mostly interesting is Java | Probable Bugs , although there are other categories that should be explored, for example, Numeric Issues .


I will talk only about some interesting warnings.


1. Unary plus


Unary pluses plucked as much as 66 in the project. Write +1 instead of just 1 sometimes you want for beauty. However, in some cases, the unary plus comes out, if instead of += wrote =+ :


 int totalCharge1 = 0; while (atoms1.hasNext()) { totalCharge1 = +((IAtom) atoms1.next()).getFormalCharge(); } Iterator<IAtom> atoms2 = products.atoms().iterator(); int totalCharge2 = 0; while (atoms2.hasNext()) { totalCharge2 = +((IAtom) atoms2.next()).getFormalCharge(); } 

The obvious misprint, which resulted in ignoring all iterations of the loop except the last. It may seem strange that not “space equals plus space” is written, but “space equals space plus”. However, the strangeness disappears if you delve into history . Initially, "equal" and "plus" were really close, but in 2008 we went through an automatic formatter, and the code changed. Here, by the way, the moral for static analyzers is that it is reasonable to issue warnings based on strange formatting, but if the code is automatically formatted, the warnings will disappear and the bugs will remain.


2. Integer division with reduction to fractional


Pretty annoying mistake, but static analyzers find it well. Here is an example :


 angle = 1 / 180 * Math.PI; 

Unfortunately, the angle was not one degree at all, but zero. Similar error :


 Integer c1 = features1.get(key); Integer c2 = features2.get(key); c1 = c1 == null ? 0 : c1; c2 = c2 == null ? 0 : c2; sum += 1.0 - Math.abs(c1 - c2) / (c1 + c2); // integer division in floating-point context 

It seems that both the numbers c1 and c2 non-negative, which means that the modulus of the difference will never exceed the sum. Therefore, the result will be 0 if both numbers are nonzero, or 1 if one of them is 0.


3. Call Class.getClass ()


Sometimes people call getClass() on an object of type Class . The result is again an object of type Class with a constant value Class.class . This is usually an error: do not call getClass() . For example, here :


 public <T extends ICDKObject> T ofClass(Class<T> intf, Object... objects) { try { if (!intf.isInterface()) throw new IllegalArgumentException("expected interface, got " + intf.getClass()); ... 

If an exception happens, reporting it will be absolutely useless. By the way, errors in the error handling procedure are often found by static analysis in old projects: as a rule, error handling procedures are tested the worst.


4. Calling toString () on an array


This is a classic of the genre: toString () for arrays is not redefined, and its result is rather useless. Usually this can be found in diagnostic messages .


 int[] dim = {0, 0, 0}; ... return "Dim:" + dim + " SizeX:" + grid.length + " SizeY:" + grid[0].length + " SizeZ:"... 

It’s difficult to notice the problem because here dim.toString() implicit, but string concatenation delegates to it. Immediately proposed a fix - wrap in Arrays.toString(dim) .


5. The collection is read, but not filled.


This is also often found in the code base, which is not constantly checked by the static analyzer. Here is a simple example :


 final Set<IBond> bondsToHydrogens = new HashSet<IBond>(); // ... 220 строк логики, но bondsToHydrogens нигде не заполняется! for (IBond bondToHydrogen : bondsToHydrogens) // в цикл не зайдём sgroup.removeBond(bondToHydrogen); 

Obviously, filling just missed. In static analyzers, there are simpler tests that talk about an unused variable, but here the variable is used, so they are silent. We need a smarter inspection, which knows about the collection.


6. On the contrary: fill, but do not read


Reverse cases are also possible. Here is an example with an array :


 int[] tmp = new int[trueBits.length - 1]; System.arraycopy(trueBits, 0, tmp, 0, i); System.arraycopy(trueBits, i + 1, tmp, i, trueBits.length - i - 1); 

Inspection knows that the third argument of the arraycopy method is used only to write the array, and after that the array is not used at all. Judging by the logic of the code, the line trueBits = tmp; .


7. Integer versus ==


This is an insidious bug, because the small values ​​of Integer objects are cached, and everything can work well, until one day the number exceeds 127. This problem may not be noticeable at all :


 for (int a = 0; a < cliqueSize; a++) { for (int b = 0; b < vecSize; b += 3) { if (cliqueList.get(a) == compGraphNodes.get(b + 2)) { cliqueMapping.add(compGraphNodes.get(b)); cliqueMapping.add(compGraphNodes.get(b + 1)); } } } 

Well, it would seem that some objects are compared in some lists, maybe everything is fine. You must be careful to see that these lists are objects of type Integer.


8. Duplicate in Map


In this inspection, the picture is worth a thousand words. See a mistake ?


Is that better?


9. The result of the method is not used.


The result of some methods is silly not to use, as IDEA readily reports :


 currentChars.trim(); 

Probably meant currentChars = currentChars.trim(); . Since strings in Java are immutable, if the result is not reassigned, nothing will happen. There is also, for example , str.substring(2) .


By the way, this is a rather complicated inspection. In addition to a previously prepared list of methods, we sometimes try to automatically determine the methods whose result is worth using. It requires an interprocedural analysis, both in the source text and in the bytecode of the libraries. And all this is done on the fly in the process of editing the code!


10. Unreachable switch branches


 // if character is out of scope don't if (c > 128) return 0; switch (c) { case '\u002d': // hyphen case '\u2012': // figure dash case '\u2013': // en-dash case '\u2014': // em-dash case '\u2212': // minus return '-'; // 002d default: return c; } 

Since we have excluded characters with a code greater than 128, the \u2012-\u2212 unreachable. It seems not to be excluded.


11. Unreachable condition


An absolutely wonderful problem in the chain of conditions :


 if (oxNum == 0) { if (hybrid.equals("sp3")) { ... } else if (hybrid.equals("sp2")) return 47; } else if (oxNum == 1 && hybrid.equals("sp3")) return 47; else if ((oxNum == 2 && hybrid.equals("sp3")) || (oxNum == 1 && hybrid.equals("sp2")) || (oxNum == 0 && hybrid.equals("sp"))) // вот это вот недостижимо return 48; else if ((oxNum == 3 && hybrid.equals("sp3")) || (oxNum >= 2 && hybrid.equals("sp2")) || (oxNum >= 1 && hybrid.equals("sp"))) return 49; 

In complex conditional logic, this is not uncommon: we check a condition that cannot be true, because its fragment has already been checked above. Here we have a separate branch, oxNum == 0 , otherwise we check oxNum == 0 && hybrid.equals("sp") , which, of course, cannot be.


12. We write to the array of zero length


Sometimes IntelliJ IDEA will notice if you are writing to an array outside its size :


 Point3d points[] = new Point3d[0]; // завели массив на 0 элементов if (nwanted == 1) { points = new Point3d[1]; points[0] = new Point3d(aPoint); points[0].add(new Vector3d(length, 0.0, 0.0)); } else if (nwanted == 2) { // а тут пытаемся в него писать — исключение неминуемо points[0] = new Point3d(aPoint); points[0].add(new Vector3d(length, 0.0, 0.0)); points[1] = new Point3d(aPoint); points[1].add(new Vector3d(-length, 0.0, 0.0)); } 

13. Length check after indexing


Another common problem with the order of actions and again during error handling :


 public void setParameters(Object[] params) throws CDKException { if (params.length > 1) { throw new CDKException("..."); } if (!(params[0] instanceof Integer)) { // раз прочитали нулевой элемент throw new CDKException("The parameter must be of type Integer"); } if (params.length == 0) return; // то длина точно не нуль maxIterations = (Integer) params[0]; } 

In the case of an empty array, the author of the code wanted to go out quietly, but because of the verification, he would come out, loudly banging an ArrayIndexOutOfBoundsException. Obviously, the order of checks violated.


14. Check for null after a call.


And again, the order of actions is violated, this time with null :


 while (!line.startsWith("frame:") && input.ready() && line != null) { line = input.readLine(); logger.debug(lineNumber++ + ": ", line); } 

IDEA writes that line != null always true. It happens that verification is really redundant, but here the code looks as if null can really be.


15. Disjunction instead of conjunction


People often confuse logical operators AND and OR. The CDK project is no exception :


 if (rStereo != 4 || pStereo != 4 || rStereo != 3 || pStereo != 3) { ... } 

Whatever rStereo and pStereo are equal to, it is clear that they cannot be equal to the four and the three at the same time, therefore this condition is always true.


16. And again disjunction instead of conjunction


A similar error , but caught by another message:


 if (getFirstMapping() != null || !getFirstMapping().isEmpty()) { ... } 

We can get to the right side only if getFirstMapping() returns null , but in this case we are guaranteed a NullPointerException, which is what IDEA warns about. By the way, here we rely on the stability of the results of the getFirstMapping() method. Sometimes we use heuristics, but here the stability is directly analyzed. Since the class is final, the method cannot be redefined. IDEA checks its body return firstSolution.isEmpty() ? null : firstSolution return firstSolution.isEmpty() ? null : firstSolution and determines that stability is reduced to the stability of the Map#isEmpty method, which is preanantized as stable.


17. Hierarchy of interfaces and instanceof


When checking an object for belonging to any interface, do not forget that interfaces can inherit each other :


 if (object instanceof IAtomContainer) { root = convertor.cdkAtomContainerToCMLMolecule((IAtomContainer) object); } else if (object instanceof ICrystal) { root = convertor.cdkCrystalToCMLMolecule((ICrystal) object); } ... 

The ICrystal interface extends the IAtomContainer interface, so the second branch is obviously unreachable: if a crystal comes here, it will fall into the first branch.


18. Bypassing the empty list


Probably, the author of this code is not very familiar with the Java language:


 List<Integer> posNumList = new ArrayList<Integer>(size); for (int i = 0; i < posNumList.size(); i++) { posNumList.add(i, 0); } 

The size parameter in the ArrayList specifies the initial size of the internal array. This is used to optimize to reduce the number of allocations, if you know in advance how many elements are added there. At the same time, the actual elements in the list do not appear, and the size() method returns 0. Therefore, the next cycle of trying to initialize the elements of the list with zeros is completely useless.


19. Do not forget to initialize the fields


The analyzer checks constructors in a special way, taking into account field initializers. Due to this, the following error was found:


 public class IMatrix { public double[][] realmatrix; public double[][] imagmatrix; public int rows; public int columns; public IMatrix(Matrix m) { rows = m.rows; columns = m.columns; int i, j; for (i = 0; i < rows; i++) for (j = 0; j < columns; j++) { realmatrix[i][j] = m.matrix[i][j]; // NullPointerException гарантирован imagmatrix[i][j] = 0d; } } } 

In spite of the fact that the fields are public, no one could penetrate and initialize them before the constructor. Therefore, IDEA boldly warns that a call to an array element will cause a NullPointerException.


20. Do not repeat twice


Repeated conditions also occur frequently. Here is an example :


 if (commonAtomCount > vfMCSSize && commonAtomCount > vfMCSSize) { return true; } 

Such bugs are treacherous, because you never know, the second condition is simply superfluous, or the author wanted to check something else. If this is not fixed right away, then it can be hard to figure it out. This is another reason why static analysis should be used constantly.


I reported some of these bugs to the project's bug tracker . It is curious that when the authors of the project corrected a part, they themselves used the IntelliJ IDEA analyzer, found other problems about which I did not write, and also began to correct them . I think this is a good sign: the authors realized the importance of static analysis.



Source: https://habr.com/ru/post/436278/