And would it not be faster to replace it with a static table (an array of float[90000] ) of the values ​​of this function on the interval [ 0;90 degrees], if the application has many trigonometric operations that do not need accuracy greater than two decimal places. Moreover, the rest of the trigonometry is overridden by the new implementation sin . Such a table will take 4 ( float dimension) * 90 * 1000 (three decimal places) ~ 300kBytes in memory. Is it worth it? What else could be the pitfalls with this approach?

  • four
    And where, for the sake of interest, you have rested against sinus productivity? :) - Costantino Rupert
  • I didn’t stop, but I think about it beforehand, there are already 30 realtime trigonometry calls accumulated in one pass of the frame drawing, it’s not going to end with anything good - igumnov
  • changes in the code for 15 minutes. You can collect statistics on the error of computations on random values, if within normal limits - I see no problems - jmu
  • @ igumnov, and you measure it, it will be faster. At the same time publish the result here. - avp
  • @avp, yes, and I’ll probably do it later, I’ll post there a lot of work without it. - igumnov

3 answers 3

  • Personally, it seems to me that it is possible to rest on the performance of a sine only if there is a huge number of calculations, that is, with any engineering calculations. In this case, the accuracy is usually very significant, and therefore the approach with caching of sine values is impractical.

  • In this case, the option of making calculations in the native module, where, for example, SIMD. will be used, looks much more preferable SIMD.

Potentially, you can also make calculations of this kind on the GPU, which should be an order of magnitude better than the caching trick offered.

  • It's hard to be sure that JIT generate good code for the proposed optimization. In such things, it makes sense to think about questions of the following type:
  • How does the pre-calculation table fit into the processor cache?
  • What assembler code will be generated for each function call?
  • Thinking about such things, having a thick layer of JIT and bytecode is extremely difficult, since Java itself is not intended for micro-optimizations.

  • Here, by the way, more interesting discussions on the topic:


  • (Update) Since we are talking about gamedev'e , then God himself ordered to carry heavy calculations in the shader.
  • one
    Yes, the method is native and is written either in C or in Assembler. As an option - to write your own implementation and compare performance, but I do not think that such things were written by profane. - Viacheslav

As a person involved in deducting, I will say this:

  1. Mathematical functions in Java are not implemented in pure Java, but by calling native (sish) functions. Just look at the Java sources, everything is clearly visible: pruflink
  2. In the native part, the implementation of sin/cos depends on the platform. For processors of the x86 family, it is implemented in assembler by calling the fsin/fcos function embedded in the floating point processor.
  3. For other processors, Sun uses (or rather, now Oracle) the FDLIBM library, which is widely known in narrow circles of evaluators.
  4. Actually the source of the sine shows that there are some power polynomials of the 13th degree.

Yes, and the actual answer to the author's question: it will not be faster to replace the sinus :)

  • one
    Cool answer! - avp

I will not be a woman Wanga, if I say that he is implemented there through decomposition into a taylor series. And if the guys from Sun really tried, then they replaced the coefficients of decomposition of the taylor series with minimax ones according to Chebyshev.

If you need to count the angles with low accuracy (2 characters as you wrote), then it is better to do this through this algorithm: CORDIC .

  • one
    You will be surprised if you find out that the Taylor series in the calculation methods is practically not used due to its inconvenience, since it gives only an approximation near a specific point, and not in the whole range - Barmaley
  • I would not be surprised, but he is often used there. If you do not believe me, look at the sources at least the same GLIBC. True for well-designed libraries for polynomials, Chebyshev minimax coefficients are sought to uniformly smudge the error across the entire selected dianazon. For sin, it is [-Pi / 4; Pi / 4]. Where a Taylor series is used, range reduction is used there to minimize the approximation error near the selected point. Then, knowing the deviation, it is not difficult to calculate the maximum degree of a polynomial and read it according to a parallel algorithm of the Estrin's scheme type. - Jack Black
  • Not to be unfounded here is a good reference to computational methods. research.scea.com/gdc2003/fast-math-functions.html There, the problems of finding the values ​​of elementary functions and their decomposition into a taylor series and finding minimax polynomials are discussed in detail. As well as various schemes of narrowing the interval and parallel calculation of values. An excellent guide for those who decided to work in this topic. From there I jerk ideas when I need to write something fast and with the necessary precision. In general, everything depends on the task and how best to implement it on hardware. Universal schemes yet - Jack Black
  • @JackBlack Well, I wouldn’t say that GLIBC is a good implementation of deduction methods - this is all at the amateur level, so irrelevant. With regards to Chebyshev - yes, here you are right: Chebyshev is ideally suited for vychmetodov (as did the Russian school). PS The answer is a bit late :) - Barmaley