Here is the code:
float max_val(const float * vec, size_t sz) { float result = 0.0f; for (size_t i = 0; i < sz; ++i) { float val = abs(vec[i]); if (val > result) result = val; } return result; } int main() { const int M = 64; const int N = 50176; const int K = 576; const float alpha = 1.0; const float beta = 0.0; float *A, *B, *C; A = (float *)mkl_malloc( M*K*sizeof( float ), 32 ); B = (float *)mkl_malloc( K*N*sizeof( float ), 32 ); C = (float *)mkl_malloc( M*N*sizeof( float ), 32 ); for (size_t i = 0; i < M*K; ++i) A[i] = 1.0; for (size_t i = 0; i < K*N; ++i) B[i] = 2.0; for (size_t i = 0; i < M*N; ++i) C[i] = 1.0; cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, M, N, K, alpha, A, K, B, N, beta, C, N);
printf("%f\n", max_val(C,M*N));
mkl_free(A); mkl_free(B); mkl_free(C); }
Compile:
icc -I/opt/intel/compilers_and_libraries_2016.0.109/linux/mkl/include -L/opt/intel/compilers_and_libraries_2016.0.109/linux/mkl/lib/intel64_lin test.cpp -o test_cblas -lmkl_rt
The array C should end up being all values of 1152. However, when I run this, I get an output of 1153.
Upon looking closer, it turns out that most of the values in C are 1152 except for a bunch of contiguous chunks that are 1153, or more generally, 1152+(initial_value_of_C_array).
If I do this instead with CblasColMajor (and change the stride values accordingly), everything works fine.
What is going on??
Thread Topic:
Help Me