I'm using the trig transforms (MKL_STAGGERED_COSINE_TRANSFORM) within an OpenMP parallel region on Windows.
Each OpenMP thread has it's own array of data to be transformed.
When I use one OpenMP thread my program works perfectly. When I use more than one, results are unpredictable.
If I enclose the trig transform code within a #pragma omp critical{} it's OK.
It seems to me that the trigonometric transforms are not thread-safe. Results are similar with sequential and parallel MKL libraries.
My code is a bit too complex to post it all here.
Thanks
Rodney