Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 3005

Performance gets worse over time for the same instructions

$
0
0

First, I'm not if this is the right forum for this question. I don't know what is as the reason could be due to hardware or MKL or .NET or some other hidden factors.

I have a neural network code in C# which heavily uses MKL via PInvoke. I set a fixed number of threads and disabled dynamic threading of MKL. The C# code is used mainly before and after training. However, during training (i.e. between iterations), MKL carries most of the computational body. No memory is allocated and there's no I/O during training.

I have observed unpredictable performance across iterations (example below) and woud like to understand why. In some other runs, the number of connections processed per second dropped to ~600M for a few iterations (very strange). For the one below, it took 6h to finish the training (i.e. each iteration takes about 12 minutes on average). It's rather consistent that the perf degrates towards the end. The perf accounting is more consistent when I run a smaller job (e.g. 20 minutes to finish).

The code is large and not sharable. If you can't pinpoint why, a hint to help me investigate further would also be appreciated.

Iterations:1/30, 1504.65M connections processed per second
Iterations:2/30, 1505.16M connections processed per second
Iterations:3/30, 1505.16M connections processed per second
Iterations:4/30, 1504.96M connections processed per second
Iterations:5/30, 1503.38M connections processed per second
Iterations:6/30, 1504.68M connections processed per second
Iterations:7/30, 1502.40M connections processed per second
Iterations:8/30, 1506.11M connections processed per second
Iterations:9/30, 1503.20M connections processed per second
Iterations:10/30, 1504.95M connections processed per second
Iterations:11/30, 1502.34M connections processed per second
Iterations:12/30, 1498.91M connections processed per second
Iterations:13/30, 1490.70M connections processed per second
Iterations:14/30, 1477.59M connections processed per second
Iterations:15/30, 1459.92M connections processed per second
Iterations:16/30, 1433.61M connections processed per second
Iterations:17/30, 1402.28M connections processed per second
Iterations:18/30, 1356.30M connections processed per second
Iterations:19/30, 1342.68M connections processed per second
Iterations:20/30, 1306.84M connections processed per second
Iterations:21/30, 1263.10M connections processed per second
Iterations:22/30, 1236.72M connections processed per second
Iterations:23/30, 1209.60M connections processed per second
Iterations:24/30, 1183.91M connections processed per second
Iterations:25/30, 1157.60M connections processed per second
Iterations:26/30, 1140.60M connections processed per second
Iterations:27/30, 1112.54M connections processed per second
Iterations:28/30, 1086.06M connections processed per second
Iterations:29/30, 1071.61M connections processed per second
Iterations:30/30, 1055.94M connections processed per second

Viewing all articles
Browse latest Browse all 3005

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>