Double precision being faster than single precision CUDA
I have implemented an algorithm in CUDA and seems it's running faster with
double precision than with single precision.
I know that usually single precision is faster in GPU. My GPU is Nvidia
Geforce GT 650M.
The algorithm pseudo code is the following:
for k to numIterations
for j to numRowsOfAnMatrix
CUDAmemset(double array)
CUBLASdotproduct(double arrayGPU,double arrayGPU)
CUBLASdotproduct(double arrayGPU,double arrayGPU)
CUBLASscalarVectorMultiplication(scalarCPU,double arrayGPU) [using
cublasDaxpy]
CUBLASvectorSum(scalarCPU,double arrayGPU) [using cublasDaxpy]
end
end
The times that I'm obtaining for 50 iterations are the following: 20.996
seconds for single precision and 20.1881 seconds for double precision.
Any idea why double precision is faster?
No comments:
Post a Comment