“Talk is cheep show me the code”
Linus Torvalds
Add the corresponding locations of
A
and B
, and store the result in C
.
void vecadd( int *A , int *B , int *C)
{
for (int i = 0; i < L; i++) {
C[i] = A[i] + B[i];
}
}
void vecadd( int *A , int *B , int *C)
{
chunk = CHUNKSIZE;
#pragma omp parallel shared(A,B,C,chunk) private(i)
{
#pragma omp for schedule(dynamic,chunk) nowait
for (int i = 0; i < L; i++) {
C[i] = A[i] + B[i];
}
}
}
#version 110
uniform sampler2D texture1;
uniform sampler2D texture2;
void main() {
vec4 A = texture2D(texture1, gl_TexCoord[0].st);
vec4 B = texture2D(texture2, gl_TexCoord[0].st);
gl_FragColor = A + B;
}
__kernel
void vecadd(__global int *A,
__global int *B,
__global int *C)
{
int id = get_global_id(0);
C[id] = A[id] + B[id];
}
__global__
void vecadd( int *A , int *B , int *C)
{
int id = blockIdx.x*blockDim.x+threadIdx.x;
C[id] = A[id] + B[id] ;
}
http://hpclab.blogspot.com/2011/09/is-gpu-good-for-large-vector-addition.html
http://hpclab.blogspot.com/2011/09/is-gpu-good-for-large-vector-addition.html
“Life is too short for man pages, and occasionally much too short without them.”
Randall Munroe (xkcd.com)
Problem: Development is hard
Solution: Always have spare GPU in your computer
Problem: Debugging is impossible
Solution: Write tests and run them!
Problem: Copying data to/from GPU is slow
Solution: Use stream and compute while data are loaded
Problem: GPU doesn't like 64bit computation
Solution: Wait for next release
Problem: I dont want to code a lot
Solution: Use libs
Before you code your custom solution.
postgres=# SELECT COUNT(*) FROM t1 WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 15;
count
-------
6718
(1 row)
Time: 7019.855 ms
postgres=# SELECT COUNT(*) FROM t2 WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 15;
count
-------
6718
(1 row)
Time: 176.301 ms
t1
and t2
contain same contents with 10 millions of records,
but t1
is a regular table and t2
is a foreign table managed by PG-Strom