Hi all,
Im first time user of MKL library and I thought a good place for me to get the hang of it is to replicate the results on this intel blog post:
Obviously I'm not using the same CPU so Im not expecting identical results. However I'm seeing negative scaling when multi-threading.
I build Caffe2 with MKL BLAS and OpenMP enabled. I'm using the same benchmark mentioned in the blog post: convnet_benchmark.py (https://github.com/pytorch/pytorch/blob/master/caffe2/python/convnet_benchmarks.py)
Through various reading I found out that it's often best to set OMP_NUM_THREADS to 1 and MKL_NUM_THREADS to no more than the maximum number of physical cores. So I run the benchmark like so:
export MKL_NUM_THREADS="8" export OMP_NUM_THREADS="1" python convnet_benchmarks.py --batch_size 8 --model AlexNet --iterations 10 --warmup_iterations 1 --cpu
I use mpstat to monitor core usage and confirm that it's in fact running on multiple cores (and it is) and yet the performance drops, even if I run the benchmark on only 2 threads. It seems to me that there is a lot of overhead with using MKL_NUM_THREADS. Has anyone else ran into similar issues? I've noticed the topic of overhead come up here and there on the forms but it doesn't seem to be the same issue.