cuda Archives - JassWeb

[Solved] What would be the source code like to add two `N x N x N x N` tensors? [closed]

March 31, 2023 by Kirat

What would be the source code like to add two N x N x N x N tensors? If we have four dimensions in our problem, how can we write the add() function as we don’t have more than three dimensions? As is usual in computer programming, there are many ways to do it. Here … Read more

[Solved] Could not find error in CUDA vector addition program [closed]

January 10, 2023 by Kirat

Please check how you set the size of each memory space you allocated. Sometimes you set it to be size, sometimes be N*size. 1 solved Could not find error in CUDA vector addition program [closed]

[Solved] What is option -O3 for g++ and nvcc?

January 1, 2023 by Kirat

It’s optimization on level 3, basically a shortcut for several other options related to speed optimization etc. (see link below). I can’t find any documentation on it. … it is one of the best known options: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#options-for-altering-compiler-linker-behavior solved What is option -O3 for g++ and nvcc?

[Solved] Can I parallelize my program?

November 11, 2022 by Kirat

Your code is fairly straightforward with lots of independent parallel loops. These parallel loops appear to be wrapped in an outer convergence do while loop, so as long as you keep the data on the device for all iterations of the convergence loop, you won’t be bottlenecked by transfers. I would recommend starting with compiler … Read more

[Solved] list of white pixels indices in image using CUDA

October 26, 2022 by Kirat

Following is a naive method to achieve the desired functionality: Generate a mask of pixel indices with dummy values for pixel with zero value. Count the number of non-zero pixels Create an output vector with length equal to non-zero count. Copy the non-zero pixel indices from the generated mask to the output vector (a process … Read more

[Solved] Why uninitialized char array is filled with random symbols?

October 21, 2022 by Kirat

what is the reason behind this output? The values in an automatic variable are indeterminate. The standard doesn’t specify it, so it might be spaces as you said, it might be random content. […] sometimes not fully filled and I left with junk […] Strings in C are null-terminated, so any routine dedicated to printing … Read more

[Solved] Cuda: Compact and result size

October 12, 2022 by Kirat

You can do this using thrust as @RobertCrovella already pointed out. The following example uses thrust::copy_if to copy all of elements’ indices for which the condition (“equals 7”) is fulfilled. thrust::counting_iterator is used to avoid creating the sequence of indices explicitly. #include <thrust/copy.h> #include <thrust/iterator/counting_iterator.h> #include <thrust/functional.h> #include <iostream> using namespace thrust::placeholders; int main() { … Read more

[Solved] Nvidia Tesla T4 tensor core benchmark [closed]

October 9, 2022 by Kirat

This might be more of an extended comment, bet hear me out … As pointed out in the comments CUDA Samples are not meant as performance measuring tools. The second benchmark you provided does not actually use tensor cores, but just a normal instruction executed on FP32 or FP64 cores. for(int i=0; i<compute_iterations; i++){ tmps[j] … Read more

[Solved] PyCuda Error in Execution

October 7, 2022 by Kirat

Did you google the error before asking here?Anyways try this BoostInstallationHowto#LD_LIBRARY_PATH.Please google before you ask here.Hope this helps you. 1 solved PyCuda Error in Execution

[Solved] how to use the cula device

October 6, 2022 by Kirat

I don’t know cula. However, after a brief look at the reference guide (which I suggest to consult prior to SO) you can use cula device functions just as host functions. However, you have to pass device memory pointers to the function. __global__ void kernel( double * A,double * B, curandState * globalState, int Asize, … Read more

[Solved] vector addition in CUDA using streams

October 5, 2022 by Kirat

One problem is how you are handling h_A, h_B, and h_C: h_A = (float *) wbImport(wbArg_getInputFile(args, 0), &inputLength); h_B = (float *) wbImport(wbArg_getInputFile(args, 1), &inputLength); The above lines of code are creating an allocation for h_A and h_B and importing some data (presumably). These lines of code: cudaHostAlloc((void **) &h_A, size, cudaHostAllocDefault); cudaHostAlloc((void **) &h_B, … Read more

[Solved] (CUDA C) Why is it not printing out the value copied from device memory?

September 29, 2022 by Kirat

Your code is correctly printing the value of c as 9. You need to clarify on the environment you are running this code. 2 solved (CUDA C) Why is it not printing out the value copied from device memory?

[Solved] I have cuda installed on win10, but anaconda let me to reinstall it in the environment

September 24, 2022 by Kirat

Anaconda is only capable of detecting and managing packages within its own environment. It cannot and will not detect and use an existing CUDA installation when installing packages with a CUDA dependency. Note however that the cudatoolkit package which conda will install is not a complete CUDA toolkit distribution. It only contains the necessary libraries … Read more

[Solved] pgi cuda fortran compiling error

September 24, 2022 by Kirat

You’re calling this a “cuda fortran” code, but it is syntactically incorrect whether you want to ultimately run the subroutine on the host (CPU) or device (GPU). You may wish to refer to this blog post as a quick start guide. If you want to run the subroutine increment on the GPU, you have not … Read more

[Solved] Pytorch crashes cuda on wrong line

September 20, 2022 by Kirat

I found an answer in a completely unrelated thread in the forums. Couldn’t find a Googleable answer, so posting here for future users’ sake. Since CUDA calls are executed asynchronously, you should run your code with CUDA_LAUNCH_BLOCKING=1 python script.py This makes sure the right line of code will throw the error message. solved Pytorch crashes … Read more