Quantcast
Channel: Intel® Software - Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 3005

Comparing FFT Performance MKL6 vs MKL11

$
0
0

Hi All,

I'm currently evaluating MKL11 to decide, if it should replace the older MKL6, that is used till now. I wrote a little console application to compare the FFT performance (for the moment just the computation time, not the numerical exactness), but the results rather suprised me, the MKL11 seems to be slower than MKL6.

The program runs 1100 FFTs with different lengths and measures the time. The attached plots show avg/min/max plots of 1100 loops (green). The red curves excluded the first 100 loops from the logging - no big difference there. The time is for each FFT calculation.

Plotting both average curves shows, that the MKL6 needs approximately half the time.

I was a bit surprised by these results - does anyone have experience on the FFT performance? Another thing that keeps me wondering are the outliners in the MKL11, that don't occur that much with MKL6.

My testing code:

int FFT_Kernel_float(unsigned int Nfft, void* pIn, void* pOut)
{
   int status;
   DFTI_DESCRIPTOR_HANDLE hand = 0;
   status = DftiCreateDescriptor(&hand, DFTI_SINGLE, DFTI_REAL, 1, Nfft);
   status = DftiSetValue(hand, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
   status = DftiCommitDescriptor(hand);
   status = DftiComputeForward(hand, pIn, pOut);
   DftiFreeDescriptor(&hand);
   return status;
}

for (exp=exponent_start;exp<=exponent_stop;exp++) //2^4 to 2^20
   {
      Nfft = (unsigned int) pow(2.0,exp);
      cxfTimesig.alloc(Nfft);
      cxfTimeaxis.alloc(Nfft);
      cxfFreqsig.alloc(Nfft);

      for (i=0;i<Nfft;i++)
      {
         cxfTimeaxis[i] = ((float) i + 1.0) / fs;
         cxfTimesig[i]  = ((float)rnd.Get()/UINT_MAX)*2-1; //random signal
      }

      Time_all_min       = 1e6;
      Time_all_max       = 0;
      Time_firstexcl_min = 1e6;
      Time_firstexcl_max = 0;
      
      hpfcAllLoops.Start(); //start time for all loops
      for (i=0;i<loops;i++) //loops = 1100
      {
         if (i==exclude_first_from_avg-1)
            hpfcFirstExcluded.Start(); //start timer for loops after first excluded loops
         hpfcIndividual.Start(); //start timer for single execution
         status = FFT_Kernel_float(Nfft,cxfTimesig.ptr(), cxfFreqsig.ptr());
         Time_individual = hpfcIndividual.Time();
         if (i>=exclude_first_from_avg-1) //exclude_first_from_avg = 100
         {
            Time_firstexcl_max = max(Time_firstexcl_max,Time_individual);
            Time_firstexcl_min = min(Time_firstexcl_min,Time_individual);
         }
         Time_all_max = max(Time_all_max,Time_individual);
         Time_all_min = min(Time_all_min,Time_individual);
      }

      Time_all_tot       = hpfcAllLoops.Time();
      Time_firstexcl_tot = hpfcFirstExcluded.Time();
      
      Time_firstexcl_avg = Time_firstexcl_tot / (double) (loops - exclude_first_from_avg);
      Time_all_avg       = Time_all_tot       / (double) loops;

      //log data here
   }


Any opinions or experiences on this issue?Am I comparing apples and oranges?

Marian

(Win7, Intel i5-2500, C++, Visual Studio 2008)


Viewing all articles
Browse latest Browse all 3005

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>