[Solved] What is option -O3 for g++ and nvcc?

It’s optimization on level 3, basically a shortcut for several other options related to speed optimization etc. (see link below). I can’t find any documentation on it. … it is one of the best known options: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#options-for-altering-compiler-linker-behavior solved What is option -O3 for g++ and nvcc?

[Solved] Can I parallelize my program?

Your code is fairly straightforward with lots of independent parallel loops. These parallel loops appear to be wrapped in an outer convergence do while loop, so as long as you keep the data on the device for all iterations of the convergence loop, you won’t be bottlenecked by transfers. I would recommend starting with compiler … Read more

[Solved] Cuda: Compact and result size

You can do this using thrust as @RobertCrovella already pointed out. The following example uses thrust::copy_if to copy all of elements’ indices for which the condition (“equals 7”) is fulfilled. thrust::counting_iterator is used to avoid creating the sequence of indices explicitly. #include <thrust/copy.h> #include <thrust/iterator/counting_iterator.h> #include <thrust/functional.h> #include <iostream> using namespace thrust::placeholders; int main() { … Read more

[Solved] how to use the cula device

I don’t know cula. However, after a brief look at the reference guide (which I suggest to consult prior to SO) you can use cula device functions just as host functions. However, you have to pass device memory pointers to the function. __global__ void kernel( double * A,double * B, curandState * globalState, int Asize, … Read more