How to quickly multiply 2 matrices on dynamic c-arrays?

Closed due to the fact that the participants of the question from Vlad from Moscow , αλεχολυτ , Kromster , HamSter , fori1ton 15 Nov '16 at 6:31 are incomprehensible.

Try to write more detailed questions. To get an answer, explain what exactly you see the problem, how to reproduce it, what you want to get as a result, etc. Give an example that clearly demonstrates the problem. If the question can be reformulated according to the rules set out in the certificate , edit it .

  • Totally incorrect question. The speed of multiplication depends on a number of factors: the size of the matrix, the nature of its elements (including their size), the method of reading, storage, the ring in which calculations are made - these are just some of the questions that must be answered first. - Zealint
  • @Zealint need an algorithm at least square everything else is standard - nikita

3 answers 3

The standard formula uses "symmetric" indices:

a[i][k] * b[k][j] ^^^----^^^ ^^^----------^^^ 

You can get performance improvements by transposing the second matrix:

 a[i][k] * b[j][k] ^^^-------^^^ ^^^-------^^^ 

Now, in both rows, the data is in a row, you can save a pointer that removes unnecessary dereferencing, and the data can normally get into the processor cache.


If the matrices are large enough, then it makes sense to think about asymptotically faster algorithms, for example, Karatsuba.

  • @ nikita, the transposition is quadratic, and in multiplication - a cube. I'm talking about inpace inversion. If there is still a memory allocation, it is already incomprehensible. In general, it is better to measure. But if there is an option to immediately consider the transposed matrix and it does not spoil anything, then they should use it. In general, you can measure. And I also called transposition inversion ... - Qwertiy
  • What pointer can I save? - nikita
  • @ nikita, a[i] and b[j] . I think the compiler will guess. - Qwertiy
  • The experiment showed that practically nothing with real sizes - 100x100 - transpose does not ... The gain of about 5% does not compensate for the long and diligent transposition of the matrix. - Harry
  • @Harry, 5% at n = 100, 15% at n = 200, 30% at n = 500. But trying to save the pointer doesn’t give anything to itself - the compiler and the smart one itself: ideone.com/QoW7x9 If you have incorrectly measured, correct it. - Qwertiy

Most likely - the usual multiplication of matrices.

enter image description here

The classic formula is

enter image description here

Any "accelerated" Strassen-type algorithms do not make sense for ordinary tasks ...

  • Meaning has one matrix invert, as far as I know. - Qwertiy
  • For greater cache localization? In principle, yes, but with real sizes, usually the matrices in the cache are placed. However, you need to play sometime ... - Harry
  • Not only. We still save pointer dereferencing. I wrote in the answer? - Qwertiy
  • This is a dynamic memory - there are no guarantees at all that the adjacent lines lie nearby. - Qwertiy
  • Well, if a person is so fool to allocate memory is not one piece - then who is he a doctor? ... - Harry

"The fastest" - take a specialized library such as MKL, ATLAS, Vienna CL and the like. First of all, they implement BLAS - and this is a gentlemanly set for working with matrices, and secondly, it is really well optimized - (and for ViennaCL - it works on the GPU), which allows it to work faster than a naive implementation.