How a loop is mapped into the GPU, in blocks, warps and threads?

By Ritesh Sahu - December 28, 2020

I would need clarification on how a loop is mapped on the device, using OpenACC. I'm also not sure about the role of blocks, warps and threads.

If I have a loop like this:

#pragma acc parallel loop
for(i=0; i<1024; i++){
  vector[i] += 1;
}

And my GPU supports "maximum threads per block = 1024". How the loop is parallelized into blocks? My first thought is that a single block is sufficient to handle the operations because the vector has 1024 elements. In this case, I think that the block is composed of 1024 threads, each one corresponding to the operation vector[i] += 1; with a different index i.

Is my understanding of what a thread is correct?

I would have so 32 warps of 32 threads. How are they executed? Can all of them run simultaneously?

from Recent Questions - Stack Overflow https://ift.tt/3rAfE79
https://ift.tt/eA8V8J

Search This Blog

Theprogrammersfirst | A technical portal.

How a loop is mapped into the GPU, in blocks, warps and threads?

Comments

Post a Comment

Popular posts from this blog

Spring Elasticsearch Operations

Hibernate Search - Elasticsearch with JSON manipulation

Today Walkin 14th-Sept