[Solved] Nvidia Tesla T4 tensor core benchmark [closed]
This might be more of an extended comment, bet hear me out … As pointed out in the comments CUDA Samples are not meant as performance measuring tools. The second benchmark you provided does not actually use tensor cores, but just a normal instruction executed on FP32 or FP64 cores. for(int i=0; i<compute_iterations; i++){ tmps[j] … Read more