Sunday, 8 September 2013

Double precision being faster than single precision CUDA

Double precision being faster than single precision CUDA

I have implemented an algorithm in CUDA and seems it's running faster with
double precision than with single precision.
I know that usually single precision is faster in GPU. My GPU is Nvidia
Geforce GT 650M.
The algorithm pseudo code is the following:
for k to numIterations
for j to numRowsOfAnMatrix
CUDAmemset(double array)
CUBLASdotproduct(double arrayGPU,double arrayGPU)
CUBLASdotproduct(double arrayGPU,double arrayGPU)
CUBLASscalarVectorMultiplication(scalarCPU,double arrayGPU) [using
cublasDaxpy]
CUBLASvectorSum(scalarCPU,double arrayGPU) [using cublasDaxpy]
end
end
The times that I'm obtaining for 50 iterations are the following: 20.996
seconds for single precision and 20.1881 seconds for double precision.
Any idea why double precision is faster?

No comments:

Post a Comment