This is a conceptual question:
Expression templates are a popular technique in C++ in order to implement Matrix and Array operations by avoiding unnecessary temporaries and loop unrolling. In other words using expression templates, an expression such as D = A+B+C, where D, A, B & C are matrices will not incur the temporaries usually resulting in a naive C++ implementation. How does this compare in performance terms by using C++ wrappers around the MKL BLAS routines. In other words will a naive implementation of a Matrix/Array class wrapping the optimized BLAS routines perform at least as well as an implementation using expression templates?
I realise this question is quite general in essence, but would be quite grateful if someone could provide me some hints on this.
Thanks!