Unsupervised pre-training for convolutional neural network in theano

I would like to design a deep net with one (or more) convolutional layers (CNN) and one or more fully connected hidden layers on top.
For deep network with fully connected layers there are methods in theano for unsupervised pre-training, e.g., using denoising auto-encoders or RBMs.

My question is: How can I implement (in theano) an unsupervised pre-training stage for convolutional layers?

I do not expect a full implementation as an answer, but I would appreciate a link to a good tutorial or a reliable reference.

Python theano with index computed inside the loop

I have installed the Theano library for increasing the speed of a computation, so that I can use the power of a GPU.

However, inside the inner loop of the computation a new index is calculated, based on the loop index and corresponding values of a couple of arrays.

That calculated index is then used to access an element of another array, which, in turn, is used for another calculation.

Is this too complicated to expect any significant speedups from Theano?

So let me rephrase my question, the other way round. Here is an example of GPU code snippet. Some initialisations are left out for reasons of brevity. Can I translate this to Python/Theano without increasing computation times considerably?

__global__ void SomeKernel(const cuComplex* __restrict__  data,
                                 float* __restrict__ voxels)


unsigned int idx = blockIdx.x * blockDim.x + threadIdx.x;

unsigned int idy = blockIdx.y * blockDim.y + threadIdx.y;

unsigned int pos = (idy * NX + idx);

unsigned int ind1 = pos * 3;
float x = voxels[ind1];
float y = voxels[ind1 + 1];
float z = voxels[ind1 + 2];

int m;

for (m = 0; m < M; ++m)
    unsigned int ind2 = 3 * m;

    float diff_x = x - some_pos[ind2];
    float diff_y = y - some_pos[ind2 + 1];
    float diff_z = z - some_pos[ind2 + 2];

    float distance = sqrtf(diff_x * diff_x
                         + diff_y * diff_y
                         + diff_z * diff_z);

    unsigned int dist = rintf(distance/some_factor);
    ind3 = m * another_factor + dist;

    cuComplex some_element = data[ind3];

    Main calculation starts, involving some_element.

Installing theano on Windows 8 with GPU enabled

I understand that the Theano support for Windows 8.1 is at experimental stage only but I wonder if anyone had any luck with resolving my issues. Depending on my config, I get three distinct types of errors. I assume that the resolution of any of my errors would solve my problem.

I have installed Python using WinPython 32-bit system, using MinGW as described here. The contents of my .theanorc file are as follows:

device = gpu

compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\

ldflags = 

When I run import theano the error is as follows:

nvcc fatal   : nvcc cannot find a supported version of Microsoft Visual Studio.
Only the versions 2010, 2012, and 2013 are supported

['nvcc', '-shared', '-g', '-O3', '--compiler-bindir', 'C:\\Program Files (x86)\\
Microsoft Visual Studio 10.0\\VC\\bin# flags=-m32 # we have this hard coded for
now', '-Xlinker', '/DEBUG', '-m32', '-Xcompiler', '-DCUDA_NDARRAY_CUH=d67f7c8a21
306c67152a70a88a837011,/Zi,/MD', '-IC:\\TheanoPython\\python-2.7.6\\lib\\site-pa
ckages\\theano\\sandbox\\cuda', '-IC:\\TheanoPython\\python-2.7.6\\lib\\site-pac
kages\\numpy\\core\\include', '-IC:\\TheanoPython\\python-2.7.6\\include', '-o',
.pyd', 'mod.cu', '-LC:\\TheanoPython\\python-2.7.6\\libs', '-LNone\\lib', '-LNon
e\\lib64', '-LC:\\TheanoPython\\python-2.7.6', '-lpython27', '-lcublas', '-lcuda
ERROR (theano.sandbox.cuda): Failed to compile cuda_ndarray.cu: ('nvcc return st
atus', 1, 'for cmd', 'nvcc -shared -g -O3 --compiler-bindir C:\\Program Files (x
86)\\Microsoft Visual Studio 10.0\\VC\\bin# flags=-m32 # we have this hard coded
 for now -Xlinker /DEBUG -m32 -Xcompiler -DCUDA_NDARRAY_CUH=d67f7c8a21306c67152a
70a88a837011,/Zi,/MD -IC:\\TheanoPython\\python-2.7.6\\lib\\site-packages\\thean
o\\sandbox\\cuda -IC:\\TheanoPython\\python-2.7.6\\lib\\site-packages\\numpy\\co
re\\include -IC:\\TheanoPython\\python-2.7.6\\include -o C:\\Users\\Matej\\AppDa
ing_3_GenuineIntel-2.7.6-32\\cuda_ndarray\\cuda_ndarray.pyd mod.cu -LC:\\TheanoP
ython\\python-2.7.6\\libs -LNone\\lib -LNone\\lib64 -LC:\\TheanoPython\\python-2
.7.6 -lpython27 -lcublas -lcudart')
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not availabl

I have also tested it using Visual Studio 12.0 which is installed on my system with the following error:

nvlink fatal   : Could not open input file 'C:/Users/Matej/AppData/Local/Temp/tm

['nvcc', '-shared', '-g', '-O3', '--compiler-bindir', 'C:\\Program Files (x86)\\
Microsoft Visual Studio 12.0\\VC\\bin\\', '-Xlinker', '/DEBUG', '-m32', '-Xcompi
ler', '-LC:\\TheanoPython\\python-2.7.6\\libs,-DCUDA_NDARRAY_CUH=d67f7c8a21306c6
7152a70a88a837011,/Zi,/MD', '-IC:\\TheanoPython\\python-2.7.6\\lib\\site-package
s\\theano\\sandbox\\cuda', '-IC:\\TheanoPython\\python-2.7.6\\lib\\site-packages
\\numpy\\core\\include', '-IC:\\TheanoPython\\python-2.7.6\\include', '-o', 'C:\
, 'mod.cu', '-LC:\\TheanoPython\\python-2.7.6\\libs', '-LNone\\lib', '-LNone\\li
b64', '-LC:\\TheanoPython\\python-2.7.6', '-lpython27', '-lcublas', '-lcudart']
ERROR (theano.sandbox.cuda): Failed to compile cuda_ndarray.cu: ('nvcc return st
atus', 1, 'for cmd', 'nvcc -shared -g -O3 --compiler-bindir C:\\Program Files (x
86)\\Microsoft Visual Studio 12.0\\VC\\bin\\ -Xlinker /DEBUG -m32 -Xcompiler -LC
a837011,/Zi,/MD -IC:\\TheanoPython\\python-2.7.6\\lib\\site-packages\\theano\\sa
ndbox\\cuda -IC:\\TheanoPython\\python-2.7.6\\lib\\site-packages\\numpy\\core\\i
nclude -IC:\\TheanoPython\\python-2.7.6\\include -o C:\\Users\\Matej\\AppData\\L
_GenuineIntel-2.7.6-32\\cuda_ndarray\\cuda_ndarray.pyd mod.cu -LC:\\TheanoPython
\\python-2.7.6\\libs -LNone\\lib -LNone\\lib64 -LC:\\TheanoPython\\python-2.7.6
-lpython27 -lcublas -lcudart')
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not availabl

In the latter error, several pop-up windows ask me how would I like to open (.res) file before error is thrown.

cl.exe is present in both folders (i.e. VS 2010 and VS 2013).

Finally, if I set VS 2013 in the environment path and set .theanorc contents as follows:

base_compiledir=C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin
floatX = float32
device = gpu

compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\

ldflags = 

I get the following error:

c:\theanopython\python-2.7.6\include\pymath.h(22): warning: dllexport/dllimport conflict with "round"
c:\program files\nvidia gpu computing toolkit\cuda\v6.5\include\math_functions.h(2455): here; dllimport/dllexport dropped

mod.cu(954): warning: statement is unreachable

mod.cu(1114): error: namespace "std" has no member "min"

mod.cu(1145): error: namespace "std" has no member "min"

mod.cu(1173): error: namespace "std" has no member "min"

mod.cu(1174): error: namespace "std" has no member "min"

mod.cu(1317): error: namespace "std" has no member "min"

mod.cu(1318): error: namespace "std" has no member "min"

mod.cu(1442): error: namespace "std" has no member "min"

mod.cu(1443): error: namespace "std" has no member "min"

mod.cu(1742): error: namespace "std" has no member "min"

mod.cu(1777): error: namespace "std" has no member "min"

mod.cu(1781): error: namespace "std" has no member "min"

mod.cu(1814): error: namespace "std" has no member "min"

mod.cu(1821): error: namespace "std" has no member "min"

mod.cu(1853): error: namespace "std" has no member "min"

mod.cu(1861): error: namespace "std" has no member "min"

mod.cu(1898): error: namespace "std" has no member "min"

mod.cu(1905): error: namespace "std" has no member "min"

mod.cu(1946): error: namespace "std" has no member "min"

mod.cu(1960): error: namespace "std" has no member "min"

mod.cu(3750): error: namespace "std" has no member "min"

mod.cu(3752): error: namespace "std" has no member "min"

mod.cu(3784): error: namespace "std" has no member "min"

mod.cu(3786): error: namespace "std" has no member "min"

mod.cu(3789): error: namespace "std" has no member "min"

mod.cu(3791): error: namespace "std" has no member "min"

mod.cu(3794): error: namespace "std" has no member "min"

mod.cu(3795): error: namespace "std" has no member "min"

mod.cu(3836): error: namespace "std" has no member "min"

mod.cu(3838): error: namespace "std" has no member "min"

mod.cu(4602): error: namespace "std" has no member "min"

mod.cu(4604): error: namespace "std" has no member "min"

31 errors detected in the compilation of "C:/Users/Matej/AppData/Local/Temp/tmpxft_00001d84_00000000-10_mod.cpp1.ii".
ERROR (theano.sandbox.cuda): Failed to compile cuda_ndarray.cu: ('nvcc return status', 2, 'for cmd', 'nvcc -shared -g -O3 -Xlinker /DEBUG -m32 -Xcompiler -DCUDA_NDARRAY_CUH=d67f7c8a21306c67152a70a88a837011,/Zi,/MD -IC:\\TheanoPython\\python-2.7.6\\lib\\site-packages\\theano\\sandbox\\cuda -IC:\\TheanoPython\\python-2.7.6\\lib\\site-packages\\numpy\\core\\include -IC:\\TheanoPython\\python-2.7.6\\include -o C:\\Users\\Matej\\AppData\\Local\\Theano\\compiledir_Windows-8-6.2.9200-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-2.7.6-32\\cuda_ndarray\\cuda_ndarray.pyd mod.cu -LC:\\TheanoPython\\python-2.7.6\\libs -LNone\\lib -LNone\\lib64 -LC:\\TheanoPython\\python-2.7.6 -lpython27 -lcublas -lcudart')
ERROR:theano.sandbox.cuda:Failed to compile cuda_ndarray.cu: ('nvcc return status', 2, 'for cmd', 'nvcc -shared -g -O3 -Xlinker /DEBUG -m32 -Xcompiler -DCUDA_NDARRAY_CUH=d67f7c8a21306c67152a70a88a837011,/Zi,/MD -IC:\\TheanoPython\\python-2.7.6\\lib\\site-packages\\theano\\sandbox\\cuda -IC:\\TheanoPython\\python-2.7.6\\lib\\site-packages\\numpy\\core\\include -IC:\\TheanoPython\\python-2.7.6\\include -o C:\\Users\\Matej\\AppData\\Local\\Theano\\compiledir_Windows-8-6.2.9200-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-2.7.6-32\\cuda_ndarray\\cuda_ndarray.pyd mod.cu -LC:\\TheanoPython\\python-2.7.6\\libs -LNone\\lib -LNone\\lib64 -LC:\\TheanoPython\\python-2.7.6 -lpython27 -lcublas -lcudart')

['nvcc', '-shared', '-g', '-O3', '-Xlinker', '/DEBUG', '-m32', '-Xcompiler', '-DCUDA_NDARRAY_CUH=d67f7c8a21306c67152a70a88a837011,/Zi,/MD', '-IC:\\TheanoPython\\python-2.7.6\\lib\\site-packages\\theano\\sandbox\\cuda', '-IC:\\TheanoPython\\python-2.7.6\\lib\\site-packages\\numpy\\core\\include', '-IC:\\TheanoPython\\python-2.7.6\\include', '-o', 'C:\\Users\\Matej\\AppData\\Local\\Theano\\compiledir_Windows-8-6.2.9200-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-2.7.6-32\\cuda_ndarray\\cuda_ndarray.pyd', 'mod.cu', '-LC:\\TheanoPython\\python-2.7.6\\libs', '-LNone\\lib', '-LNone\\lib64', '-LC:\\TheanoPython\\python-2.7.6', '-lpython27', '-lcublas', '-lcudart']

If I run import theano without the GPU option on, it runs without a problem. Also CUDA samples run without a problem.

Theano HiddenLayer Activation Function

Is there anyway to use Rectified Linear Unit (ReLU) as the activation function of the hidden layer instead of tanh() in Theano? The implementation of the hidden layer is as follows and as far as I have searched on the internet ReLU is not implemented inside the Theano.

class HiddenLayer(object):
  def __init__(self, rng, input, n_in, n_out, W=None, b=None, activation=T.tanh):

theano - print value of TensorVariable

How can I print the numerical value of a theano TensorVariable? I'm new to theano, so please be patient :)

I have a function where I get y as a parameter. Now I want to debug-print the shape of this y to the console. Using

print y.shape

results in the console output (i was expecting numbers, i.e. (2,4,4)):


Or how can I print the numerical result of for example the following code (this counts how many values in y are bigger than half the maximum):

errorCount = T.sum(T.gt(T.abs_(y),T.max(y)/2.0))

errorCount should be a single number because T.sum sums up all the values. But using

print errCount

gives me (expected something like 134):


How can I assign/update subset of tensor shared variable in Theano?

When compiling a function in theano, a shared variable(say X) can be updated by specifying updates=[(X, new_value)]. Now I am trying to update only subset of a shared variable:

from theano import tensor as T
from theano import function
import numpy

X = T.shared(numpy.array([0,1,2,3,4]))
Y = T.vector()
f = function([Y], updates=[(X[2:4], Y)] # error occur:
                                        # 'update target must 
                                        # be a SharedVariable'

The codes will raise a error "update target must be a SharedVariable", I guess that means update targets can't be non-shared variables. So is there any way to compile a function to just udpate subset of shared variables?

Purpose of 'given' variables in

I was reading the code for the logistic function given at http://deeplearning.net/tutorial/logreg.html. I am confused about the difference between input & given variables for a function. The functions that compute mistakes made by a model on a minibatch are:

 test_model = theano.function(inputs=[index],
            x: test_set_x[index * batch_size: (index + 1) * batch_size],
            y: test_set_y[index * batch_size: (index + 1) * batch_size]})

validate_model = theano.function(inputs=[index],
            x: valid_set_x[index * batch_size:(index + 1) * batch_size],
            y: valid_set_y[index * batch_size:(index + 1) * batch_size]})

Why couldn't/wouldn't one just make x& y shared input variables and let them be defined when an actual model instance is created?

Theano fails due to NumPy Fortran mixup under Ubuntu

I installed Theano on my machine, but the nosetests break with a Numpy/Fortran related error message. For me it looks like Numpy was compiled with a different Fortran version than Theano. I already reinstalled Theano (sudo pip uninstall theano + sudo pip install --upgrade --no-deps theano) and Numpy / Scipy (apt-get install --reinstall python-numpy python-scipy), but this did not help.

What steps would you recommend?

Complete error message:

ImportError: ('/home/Nick/.theano/compiledir_Linux-2.6.35-31-generic-x86_64-with-Ubuntu-10.10-maverick--2.6.6/tmpIhWJaI/0c99c52c82f7ddc775109a06ca04b360.so: undefined symbol: _gfortran_st_write_done'

My research:

The Installing SciPy / BuildingGeneral page about the undefined symbol: _gfortran_st_write_done' error:

If you see an error message

ImportError: /usr/lib/atlas/libblas.so.3gf: undefined symbol: _gfortran_st_write_done

when building SciPy, it means that NumPy picked up the wrong Fortran compiler during build (e.g. ifort).

Recompile NumPy using:

python setup.py build --fcompiler=gnu95

or whichever is appropriate (see python setup.py build --help-fcompiler).


Nick@some-serv2:/usr/local/lib/python2.6/dist-packages/numpy$ python setup.py build --help-fcompiler
This is the wrong setup.py file to run

Used software versions:

  • scipy 0.10.1 (scipy.test() works)
  • NumPy 1.6.2 (numpy.test() works)
  • theano 0.5.0 (several tests fails with undefined symbol: _gfortran_st_write_done')
  • python 2.6.6
  • Ubuntu 10.10


So I removed numpy and scipy from my system with apt-get remove and using find -name XXX -delete of what was left.

Than I installed numpy and scipy from the github sources with sudo python setpy.py install.

Afterwards I entered again sudo pip uninstall theano and sudo pip install --upgrade --no-deps theano.

Error persists :/

I also tried the apt-get source ... + apt-get build-dep ... approach, but for my old Ubuntu (10.10) it installs too old version of numpy and scipy for theano: ValueError: numpy >= 1.4 is required (detected 1.3.0 from /usr/local/lib/python2.6/dist-packages/numpy/__init__.pyc)

numpy array from csv file for lasagne

I started learning how to use theano with lasagne, and started with the mnist example. Now, I want to try my own example: I have a train.csv file, in which every row starts with 0 or 1 which represents the correct answer, followed by 773 0s and 1s which represent the input. I didn't understand how can I turn this file to the wanted numpy arrays in the load_database() function. this is the part from the original function for the mnist database:


with gzip.open(filename, 'rb') as f:
    data = pickle_load(f, encoding='latin-1')

# The MNIST dataset we have here consists of six numpy arrays:
# Inputs and targets for the training set, validation set and test set.
X_train, y_train = data[0]
X_val, y_val = data[1]
X_test, y_test = data[2]


# We just return all the arrays in order, as expected in main().
# (It doesn't matter how we do this as long as we can read them again.)
return X_train, y_train, X_val, y_val, X_test, y_test

and I need to get the X_train (the input) and the y_train (the beginning of every row) from my csv files.


Theano: Why does indexing fail in this case?

I'm trying to get the max of a vector given a boolean value.

With Numpy:

>>> this = np.arange(10)
>>> this[~(this>=5)].max()

But with Theano:

>>> that = T.arange(10, dtype='int32')
>>> that[~(that>=5)].max().eval()
>>> that[~(that>=5).nonzero()].max().eval()
Traceback (most recent call last):
  File "<pyshell#146>", line 1, in <module>
AttributeError: 'TensorVariable' object has no attribute 'nonzero'

Why does this happen? Is this a subtle nuance that i'm missing?

Use of None in Array indexing in Python

I am using the LSTM tutorial for Theano (http://deeplearning.net/tutorial/lstm.html). In the lstm.py (http://deeplearning.net/tutorial/code/lstm.py) file, I don't understand the following line:

c = m_[:, None] * c + (1. - m_)[:, None] * c_

What does m_[:, None] mean? In this case m_ is the theano vector while c is a matrix.

Is there a GPU accelerated numpy.max(X, axis=0) implementation in Theano?

Do we have a GPU accelerated of version of numpy.max(X, axis=None) in Theano. I looked into the documentation and found theano.tensor.max(X, axis=None), but it is 4-5 times slower than the numpy implementation.

I can assure you, it is not slow because of some bad choice of matrix size. Same matrix under theano.tensor.exp is 40 times faster than its numpy counterpart.

Any suggestions?

Does Theano do automatic unfolding for BPTT?

I am implementing an RNN in Theano and I have difficulties training it. It doesn't even come near to memorising the training corpus. My mistake is most likely caused by me not understanding exactly how Theano copes with backpropagation through time. Right now, my code is as simple as it gets:

grad_params = theano.tensor.grad(cost, params)

My question is: given that my network is recurrent, does this automatically do the unfolding of the architecture into a feed-forward one? On one hand, this example does exactly what I am doing. On the other hand, this thread makes me think I'm wrong.

In case it does do the unfolding for me, how can I truncate it? I can see that there is a way, from the documentation of scan, but I can't come up with the code to do it.

Getting Theano to use the GPU

I am having quite a bit of trouble setting up Theano to work with my graphics card - I hope you guys can give me a hand.

I have used CUDA before and it is properly installed as would be necessary to run Nvidia Nsight. However, I now want to use it with PyDev and am having several problems following the 'Using the GPU' part of the tutorial at http://deeplearning.net/software/theano/install.html#gpu-linux

The first is quite basic, and that is how to set up the environment variables. It says I should 'Define a $CUDA_ROOT environment variable'. Several sources have said to create a new '.pam_environment' file in my home directory. I have done this and written the following:

CUDA_ROOT = /usr/local/cuda-5.5/bin
LD_LIBRARY_PATH = /usr/local/cuda-5.5/lib64/lib

I am not sure if this is exactly the way it has to be written - apologies if this is a basic question. If I could get confirmation that this is indeed the correct place to have written it, too, that would be helpful.

The second problem is in the following part of the tutorial. It says to 'change the device option to name the GPU device in your computer'. Apparently this has something to do with THEANO_FLAGS and .theanorc, but nowhere am I able to find out what these are: are they files? If so where do I find them? The tutorial seems to be assuming some knowledge that I don't have!

Thanks for taking the time to read this: any and all answers are greatly appreciated - I am very much completely stuck at the moment!

Add bias to Lasagne neural network layers

I am wondering if there is a way to add bias node to each layer in Lasagne neural network toolkit? I have been trying to find related information in documentation.

This is the network I built but i don't know how to add a bias node to each layer.

def build_mlp(input_var=None):
    # This creates an MLP of two hidden layers of 800 units each, followed by
    # a softmax output layer of 10 units. It applies 20% dropout to the input
    # data and 50% dropout to the hidden layers.

    # Input layer, specifying the expected input shape of the network
    # (unspecified batchsize, 1 channel, 28 rows and 28 columns) and
    # linking it to the given Theano variable `input_var`, if any:
    l_in = lasagne.layers.InputLayer(shape=(None, 60),

    # Apply 20% dropout to the input data:
    l_in_drop = lasagne.layers.DropoutLayer(l_in, p=0.2)

    # Add a fully-connected layer of 800 units, using the linear rectifier, and
    # initializing weights with Glorot's scheme (which is the default anyway):
    l_hid1 = lasagne.layers.DenseLayer(
            l_in_drop, num_units=800,

    # We'll now add dropout of 50%:
    l_hid1_drop = lasagne.layers.DropoutLayer(l_hid1, p=0.5)

    # Another 800-unit layer:
    l_hid2 = lasagne.layers.DenseLayer(
            l_hid1_drop, num_units=800,

    # 50% dropout again:
    l_hid2_drop = lasagne.layers.DropoutLayer(l_hid2, p=0.5)

    # Finally, we'll add the fully-connected output layer, of 10 softmax units:
    l_out = lasagne.layers.DenseLayer(
            l_hid2_drop, num_units=2,

    # Each layer is linked to its incoming layer(s), so we only need to pass
    # the output layer to give access to a network in Lasagne:
    return l_out

