When I'm using the mkl_cspblas_dcsrgemv function, if I request the input matrix to be transposed, the program heap gets corrupted.
This happens in threaded and sequential MKL, this example program uses sequential to get a cleaner valgrind report, and it illustrates the problem:
//icc dcsrgemv_bug.c -mkl=sequential -g #include <stdio.h> #include <stdlib.h> #include <mkl.h> void dump_vec(char *s, double *t, MKL_INT n) { printf("SUM %s [ %.1f ... %7.1f ] = %9.1f\n", s, t[0], t[n-1], cblas_dasum(n, t, 1)); } int main(int argc, char *argv[]) { MKL_INT r = 750; MKL_INT c = 250; MKL_INT v = c*2; double *V = calloc(v, sizeof(double)); MKL_INT *C = calloc(v, sizeof(double)); MKL_INT *R = calloc(r+1, sizeof(double)); double *X = calloc(c, sizeof(double)); double *Y = calloc(r, sizeof(double)); double *Z = calloc(c, sizeof(double)); MKL_INT i; /* Matrix for (r, c) 1 _ _ _ .... _ _ 2 _ _ .... _ _ _ 3 _ .... _ _ _ _ 4 .... _ : : : : . : : : : : . : _ _ _ _ .... c 1 _ _ _ .... _ _ 1 _ _ .... _ _ _ 1 _ .... _ _ _ _ 1 .... _ : : : : . : : : : : . : _ _ _ _ .... 1 _ _ _ _ .... _ : : : : . : : : : : . : _ _ _ _ .... _ */ for (i = 0; i < c; ++i) { V[i] = i + 1; V[i + c] = 1; C[i] = i; C[i + c] = i; } for (i = 0; i < v; ++i) { R[i] = i; } for (i = v; i <= r; ++i) { R[i] = v; } for (i = 0; i < c; ++i) { X[i] = 1.0; } dump_vec("X", X, c); mkl_cspblas_dcsrgemv("N", &r, V, R, C, X, Y); dump_vec("Y", Y, r); mkl_cspblas_dcsrgemv("T", &r, V, R, C, Y, Z); dump_vec("Z", Z, c); return 0; }
This program crashes on exit due to heap corruption.
When I run this through valgrind, I get the following report:
bawr@core:~/SVDLIBT$ valgrind ./a.out ==1591== Memcheck, a memory error detector ==1591== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==1591== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info ==1591== Command: ./a.out ==1591== SUM X [ 1.0 ... 1.0 ] = 250.0 SUM Y [ 1.0 ... 0.0 ] = 31625.0 ==1591== Invalid write of size 4 ==1591== at 0x9C75CE6: mkl_spblas_lp64_def_dcsr0tg__c__mvout_par (in /opt/intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_def.so) ==1591== by 0x593180A: mkl_spblas_lp64_dcsr0tg__c__mvout_omp (in /opt/intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_sequential.so) ==1591== by 0x58AEA99: mkl_spblas_lp64_mkl_cspblas_dcsrgemv (in /opt/intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_sequential.so) ==1591== by 0x4012B5: main (dcsrgemv_bug.c:78) ==1591== Address 0x8771f50 is 0 bytes after a block of size 2,000 alloc'd ==1591== at 0x4C2CC70: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==1591== by 0x401038: main (dcsrgemv_bug.c:24) ==1591== SUM Z [ 2.0 ... 62501.0 ] = 5239875.0 ==1591== ==1591== Process terminating with default action of signal 11 (SIGSEGV) ==1591== Access not within mapped region at address 0x28 ==1591== at 0x4009F7E: do_lookup_x (dl-lookup.c:98) ==1591== by 0x400A990: _dl_lookup_symbol_x (dl-lookup.c:737) ==1591== by 0x400F556: _dl_fixup (dl-runtime.c:111) ==1591== by 0x4016514: _dl_runtime_resolve (dl-trampoline.S:45) ==1591== by 0x81DE75F: __call_tls_dtors (cxa_thread_atexit_impl.c:83) ==1591== by 0x81DE086: __run_exit_handlers (exit.c:40) ==1591== by 0x81DE194: exit (exit.c:104) ==1591== by 0x81C3ECB: (below main) (libc-start.c:321) ==1591== If you believe this happened as a result of a stack ==1591== overflow in your program's main thread (unlikely but ==1591== possible), you can try to increase the size of the ==1591== main thread stack using the --main-stacksize= flag. ==1591== The main thread stack size used in this run was 8388608. ==1591== ==1591== Process terminating with default action of signal 11 (SIGSEGV) ==1591== Access not within mapped region at address 0x28 ==1591== at 0x4009F7E: do_lookup_x (dl-lookup.c:98) ==1591== by 0x400A990: _dl_lookup_symbol_x (dl-lookup.c:737) ==1591== by 0x400F556: _dl_fixup (dl-runtime.c:111) ==1591== by 0x4016514: _dl_runtime_resolve (dl-trampoline.S:45) ==1591== by 0x4A256BC: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-amd64-linux.so) ==1591== If you believe this happened as a result of a stack ==1591== overflow in your program's main thread (unlikely but ==1591== possible), you can try to increase the size of the ==1591== main thread stack using the --main-stacksize= flag. ==1591== The main thread stack size used in this run was 8388608. ==1591== ==1591== HEAP SUMMARY: ==1591== in use at exit: 25,640 bytes in 12 blocks ==1591== total heap usage: 12 allocs, 0 frees, 25,640 bytes allocated ==1591== ==1591== LEAK SUMMARY: ==1591== definitely lost: 24,208 bytes in 9 blocks ==1591== indirectly lost: 0 bytes in 0 blocks ==1591== possibly lost: 0 bytes in 0 blocks ==1591== still reachable: 1,432 bytes in 3 blocks ==1591== suppressed: 0 bytes in 0 blocks ==1591== Rerun with --leak-check=full to see details of leaked memory ==1591== ==1591== For counts of detected and suppressed errors, rerun with: -v ==1591== ERROR SUMMARY: 592 errors from 1 contexts (suppressed: 2 from 1) Segmentation fault (core dumped)
Am I doing something wrong, or is this an MKL bug?