jacket in CSS

How do I run MATLAB code on the GPU using CUDA?

I want to run MATLAB code on the GPU using NVIDIA's CUDA. I found a couple of 3rd-party engines:

Would anyone recommend these or are there better ones out there? Any tips or suggestions?

How to compute k largest eigen values on GPU?

I'm working on parallel algorithm for spectral clustering for which I need to calculate K largest eigen values.I'm using Jacket plugin for Matlab but sadly it doesn't support EIGS function in matlab(It is not able to calculate K eigen values in parallel)Can anyone please suggest some other tool/library to do this task on GPU?Or Can I still do this in GPU assisted Matlab?

Source: (StackOverflow)

GCOMPILE support for GFOR?

I stumbled across this problem while working with Jacket.

I use a compiled function (compiled with gcompile) within a gfor loop. This is meant to be supported as far as I know: http://wiki.accelereyes.com/wiki/index.php/GCOMPILE

But I observed that while the uncompiled function delivers the correct results, the compiled function gives the same output for all the gfor-iterations:

%================
% function[C] = test(A,B)
% C = A+B;
% end
%================

testing = gcompile('test.m');

A = gdouble(1:1:10);
B = gdouble(2:2:20);
C1 = gzeros(10,1);
C2 = gzeros(10,1);

gfor l=1:10
    C1(l) = test(A(l),B(l));
    C2(l) = testing(A(l),B(l));
gend

The output is:

C1 = [ 3,6,9,12,15,18,21,24,27,30] (correct result)

C2 = [ 3,3,3,3,3,3,3,3,3,3]

Can you verify/rebut my results? What am I doing wrong?

Cheers, Angela

Source: (StackOverflow)

mldivide very slow with jacket?

I wrote this code: app and cova are matrix with dimension equal to variable dim in the code and gsingle.

dim=32;
gfor q=1:256
     app(:,:,q)=cova(:,:,q)\geye(dim,dim,'single');
gend

if I try to increase the dimension of dim the result is very slow. If i write the equivalent code with for loop and with cpu variable is faster. Why does it happen?

Source: (StackOverflow)

very slow matlab jacket if statement

I encountered a very slow if statement response using cuda\jacket in matlab. (5 sec vs 0.02 sec for the same code that finds local maxima, using a simple for loop and an if condition)

Being new to GPU programming, I went reading and when I saw a previous matlab if statements with CUDA SO discussion, I felt something is missing. You don't need to use cuda to know that it is better to vectorized your code. However, there are cases where you will need to use an if statement anyway. For example, I'd like to find whether a pixel of a 2D image (say m(a,b)) is the the local maximum of its 8 nearest neighbors. In matlab, an easy way to do that is by using 8 logical conditions on an if statement:

if m(a,b)>m(a-1,b-1) & m(a,b)>(a,b-1) & m(a,b)>(a+1,b-1) & ... etc on all nearest neighbors

I'd appreciate if you have an idea how to resolve (or vectorize) this...

Source: (StackOverflow)

Matlab's find statement using CUDA \ Jacket, or, is there a parallel way to find nonzero matrix elements?

I'd like to find non-zero elements after using a threshold on a matrix as fast as possible. Having CUDA \ Jacket in mind, I've learned that this is much slower than the "regular" cpu version of matlab of find, probably due to memory allocation issues, since the size of the output is not known prior to the find function. However, using 'bwlabel' and 'regionprops' (both supported in Jacket) does effectively yield info regarding the non-zero elements, and much faster than matlab's built in Image Processing Toolbox functions. Is there a way to harness this to get the non-zero elements? is there instead a way to do some processing on each of the labeled objects that are found using bwlabel?

Source: (StackOverflow)

obtaining weighted centroids using regionprops in matlab + jacket

Using Matlab's image processing toolbox I can find weighted centroids using the regionprops function. This is because the function can return either a WeightedCentroid or a list of pixels indices per labeled part of the image, by PixelList, and then the weighted centroid is easily calculated. However, jacket's support in regionprops only returns an unweighted centroid (or the centroid of the binary "island" that was obtained using bwlabel earlier). This means that the info regarding pixel positions was somehow used in order to find these centroids.

How can I access jacket's regionprops info regarding the list of pixels it uses to calculate the unweighted centroid so I could use it to calculate a weighted centroid? (One important reason to do this is because the function find cannot be used in a gfor loop, otherwise one can find the different output values of bwlabel...)

Source: (StackOverflow)

Accelerating MATLAB code using GPUs?

AccelerEyes announced in December 2012 that it works with Mathworks on the GPU code and has discontinued its product Jacket for MATLAB:

http://blog.accelereyes.com/blog/2012/12/12/exciting-updates-from-accelereyes/

Unfortunately they do not sell Jacket licences anymore.

As far as I understand, the Jacket GPU Array solution based on ArrayFire was much faster than the gpuArray solution provided by MATLAB.

I started working with gpuArray, but I see that many functions are implemented poorly. For example a simple

myArray(:) = 0

is very slow. I have written some custom CUDA-Kernels, but the poorly-implemented standard MATLAB functionality adds a lot of overhead, even if working with gpuArrays consistently throughout the code. I fixed some issues by replacing MATLAB code with hand written CUDA code - but I do not want to reimplement the MATLAB standard functionality.

Another feature I am missing is sparse GPU matrices.

So my questions are:

How do is speed up the badly implemented default GPU implementations provided by MATLAB? In particular, how do I speed up sparse matrix operations in MATLAB using the GPU?

Source: (StackOverflow)