Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 3005

How to store A to get fastest performance of AT*x using cblas_dgemv?

$
0
0

Hello,

I am using cblas_dgemv to obtain AT*x.  The size of the matrix A is about 10000 Rows x 20000 columns.  I am storing A in row major format.  Ai,j+1 is stored next to Aij

My questions are as follows (in order to get fastest execution time):

  1. What is better way to store A -- row major format or column major format? does it matter?
  2. Is it better to store A and set TransA=CblasTrans or store AT directly and use it with TransA=CblasNoTrans.
  3. If answer to #2 is to use AT directly, is it better to store AT in rowmajor format or column major format?

Another related question I have has to do with byte alignment.  Let us say we are storing in A in row major format.  A has m rows and n columns.  I have read that, when doing multithreading using openmp, to avoid false sharing it is better if each row of A starts at a byte aligned boundary.  A common way of doing that is by padding the number of columns such that it is divisible by 8 (64 bytes for 8 doubles).  So LDA = n + (8 - n%8).  Does doing this help dgemv run faster?

Finally, For my calculation I need alpha=1 and beta=0.  Does cblas_dgemv optimize for this trivial case or does it do the extra and in this unneccessary calculations?

Thanks in advance for any help.


Viewing all articles
Browse latest Browse all 3005

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>