Hi,
I am working with the mkl_sparse_s_mm routine to perform:
C = A * B
where A is a sparse matrix and C,B are dense matrices. I would like to have some details about the algorithm for sparse-dense matrix multiplication implemented by mkl_sparse_s_mm, i.e., if it uses cache aware strategies, specific micro-kernel implementations to fully leverage CPU registers etc..
Just to provide an example, high performance dense matrix multiplication (cblas_gemm) is usually implemented following the block pack algorithm.
Thank you
Cosimo Rullli