Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 3005

jit_gemm_convolution bwd data is too slow

$
0
0

Hi, I encountered a performance issue on jit_gemm_convolution, I have one convolution primitive whose input is: stride_w = 2 and jcp.t_pad = 3, so it can not go through avx512 or avx2 path, it go to jit_gemm_convolution path, however, our workload is dealing with small batch with large input, suppose it is 2*3*2240*2240, batch size = 2 on googlenet v1, running on xeon phi(68 cores). In jit_gemm_convolution bwd data execute, it will seperate it as 2 thread, each thread dealing with one batch(3*2240*2240). so it is very slow(sgemm and col2img are running on two cores). other 66 cores are running with no thread.So how can I solve it? or how can I make it running on avx512/avx2 path? thanks.  

Zone: 


Viewing all articles
Browse latest Browse all 3005

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>