
cuda interview questions

Top cuda frequently asked interview questions

What is the canonical way to check for errors using the CUDA runtime API?

Looking through the answers and comments on CUDA questions, and in the CUDA tag wiki, I see it is often suggested that the return status of every API call should checked for errors. The API documentation contains functions like cudaGetLastError, cudaPeekAtLastError, and cudaGetErrorString, but what is the best way to put these together to reliably catch and report errors without requiring lots of extra code?

Source: (StackOverflow)

Can I program Nvidia's CUDA using only Python or do I have to learn C?

I guess the question speaks for itself. I'm interested in doing some serious computations but am not a programmer by trade. I can string enough python together to get done what I want. But can I write a program in python and have the GPU execute it using CUDA? Or do I have to use some mix of python and C?

The examples on Klockner's (sp) "pyCUDA" webpage had a mix of both python and C, so I'm not sure what the answer is.

If anyone wants to chime in about Opencl, feel free. I heard about this CUDA business only a couple of weeks ago and didn't know you could use your video cards like this.

Source: (StackOverflow)


Coding CUDA with C#?

I've been looking for some information on coding CUDA (the nvidia gpu language) with C#. I have seen a few of the libraries, but it seems that they would add a bit of overhead (because of the p/invokes, etc).

  • How should I go about using CUDA in my C# applications? Would it be better to code it in say C++ and compile that into a dll?
  • Would this overhead of using a wrapper kill any advantages I would get from using CUDA?
  • And are there any good examples of using CUDA with C#?

Thanks, Max

Source: (StackOverflow)

Error Message : Cannot find or open the PDB file

I tried running sample programs provided at NVIDIA's official site. Most of the programs ran smoothly except few where I get similar error messages. How can I fix that? Here's a sample of error message I got after running a program named "MatrixMul".

Note: I have installed both x32 and x64 NVIDIA CUDA Toolkit v5.0 on my Window7x64 OS.

Source: (StackOverflow)

How to get the nvidia driver version from the command line?

For debugging CUDA code and checking compatibilities I need to find out what nvidia driver version for the GPU I have installed. I found How to get the cuda version? but that does not help me here.

Source: (StackOverflow)

Compression library using Nvidia's CUDA [closed]

Does anyone know a project which implements standard compression methods (like Zip, GZip, BZip2, LZMA,...) using NVIDIA's CUDA library?

I was wondering if algorithms which can make use of a lot of parallel tasks (like compression) wouldn't run much faster on a graphics card than with a dual or quadcore CPU.

What do you think about the pros and cons of such an approach?

Source: (StackOverflow)

How to get the cuda version?

Is there any quick command or script to check for the version of the installed CUDA? Found the manual of 4.0 under the installation directory, but not sure whether the actual installed version is that or not. Thanks!

Source: (StackOverflow)

What is a bank conflict? (Doing Cuda/OpenCL programming)

I have been reading the programming guide for CUDA and OpenCL, and I cannot figure out what a bank conflict is. They just sort of dive into how to solve the problem without elaborating on the subject itself. I tried googling for bank conflict and bank conflict computer science but I couldn't find much. Can anybody help me understand or point me to a good link? I have no preference if the help is in the context of CUDA/OpenCL or just bank conflicts in general in computer science, thanks :)

Source: (StackOverflow)

Using GPU from a docker container?

I'm searching for a way to use the GPU from inside a docker container.

The container will execute arbitrary code so i don't want to use the privileged mode.

Any tips?

From previous research i understood that run -v and/or LXC cgroup was the way to go but i'm not sure how to pull that off exactly

Source: (StackOverflow)

difference between global and device functions

can anyone describe me the difference between __global__ and __device__ ?
when I should use from __device__ and when use __global__ ?

Source: (StackOverflow)

CUDA model - what is warp size?

What's the relationship between maximum work group size and warp size? Let’s say my device has 240 CUDA streaming processors (SP) and returns the following information -





This means it has eight SPs per streaming multiprocessor (that is, compute unit). Now how is warp size = 32 related to these numbers?

Source: (StackOverflow)

NVIDIA vs AMD: GPGPU performance

I'd like to hear from people with experience of coding for both. Myself, I only have experience with NVIDIA.

NVIDIA CUDA seems to be a lot more popular than the competition. (Just counting question tags on this forum, 'cuda' outperforms 'opencl' 3:1, and 'nvidia' outperforms 'ati' 15:1, and there's no tag for 'ati-stream' at all).

On the other hand, according to Wikipedia, ATI/AMD cards should have a lot more potential, especially per dollar. The fastest NVIDIA card on the market as of today, GeForce 580 ($500), is rated at 1.6 single-precision TFlops. AMD Radeon 6970 can be had for $370 and it is rated at 2.7 TFlops. The 580 has 512 execution units at 772 MHz. The 6970 has 1536 execution units at 880 MHz.

How realistic is that paper advantage of AMD over NVIDIA, and is it likely to be realized in most GPGPU tasks? What happens with integer tasks?

Source: (StackOverflow)

Best approach for GPGPU/CUDA/OpenCL in Java?

General-purpose computing on graphics processing units (GPGPU) is a very attractive concept to harness the power of the GPU for any kind of computing.

I'd love to use GPGPU for image processing, particles, and fast geometric operations.

Right now, it seems the two contenders in this space are CUDA and OpenCL. I'd like to know:

  • Is OpenCL usable yet from Java on Windows/Mac?
  • What are the libraries ways to interface to OpenCL/CUDA?
  • Is using JNA directly an option?
  • Am I forgetting something?

Any real-world experience/examples/war stories are appreciated.

Source: (StackOverflow)

GPU Emulator for CUDA programming without the hardware

Question: Is there an emulator for a Geforce card that would allow me to program and test CUDA without having the actual hardware?


I'm looking to speed up a few simulations of mine in CUDA, but my problem is that I'm not always around my desktop for doing this development. I would like to do some work on my netbook instead, but my netbook doesn't have a GPU. Now as far as I know, you need a CUDA capable GPU to run CUDA. Is there a way to get around this? It would seem like the only way is a GPU emulator (which obviously would be painfully slow, but would work). But whatever way there is to do this I would like to hear.

I'm programming on Ubuntu 10.04 LTS.

Source: (StackOverflow)