EzDevInfo.com

scipy interview questions

Top scipy frequently asked interview questions

What are the differences between Pandas and NumPy+SciPy in Python? [closed]

They both seem exceedingly similar and I'm curious as to which package would be more beneficial for financial data analysis.


Source: (StackOverflow)

How do I install SciPy on 64 bit Windows?

How do I install SciPy on my system?

For the NumPy part (that SciPy depends on) there is actually an installer for 64 bit Windows: numpy-1.3.0.win-amd64-py2.6.msi (is direct download URL, 2310144 bytes).

Running the SciPy superpack installer results in this message in a dialog box:

Cannot install. Python version 2.6 required, which was not found in the registry.

I already have Python 2.6.2 installed (and a working Django installation in it), but I don't know about any Registry story.

The registry entries seem to already exist:

REGEDIT4

[HKEY_LOCAL_MACHINE\SOFTWARE\Python]

[HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore]

[HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6]

[HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6\Help]

[HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6\Help\Main Python Documentation]
@="D:\\Python262\\Doc\\python262.chm"

[HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6\InstallPath]
@="D:\\Python262\\"

[HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6\InstallPath\InstallGroup]
@="Python 2.6"

[HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6\Modules]

[HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6\PythonPath]
@="D:\\Python262\\Lib;D:\\Python262\\DLLs;D:\\Python262\\Lib\\lib-tk"

What I have done so far:

Step 1

Downloaded the NumPy superpack installer numpy-1.3.0rc2-win32-superpack-python2.6.exe (direct download URL, 4782592 bytes). Running this installer resulted in the same message, "Cannot install. Python version 2.6 required, which was not found in the registry.". Update: there is actually an installer for NumPy that works - see beginning of the question.

Step 2

Tried to install NumPy in another way. Downloaded the zip package numpy-1.3.0rc2.zip (direct download URL, 2404011 bytes), extracted the zip file in a normal way to a temporary directory, D:\temp7\numpy-1.3.0rc2 (where setup.py and README.txt is). I then opened a command line window and:

d:
cd D:\temp7\numpy-1.3.0rc2
setup.py install

This ran for a long time and also included use of cl.exe (part of Visual Studio). Here is a nearly 5000 lines long transcript (230 KB).

This seemed to work. I can now do this in Python:

import numpy as np
np.random.random(10)

with this result:

array([ 0.35667511,  0.56099423,  0.38423629,  0.09733172,  0.81560421,
        0.18813222,  0.10566666,  0.84968066,  0.79472597,  0.30997724])

Step 3

Downloaded the SciPy superpack installer, scipy-0.7.1rc3- win32-superpack-python2.6.exe (direct download URL, 45597175 bytes). Running this installer resulted in the message listed in the beginning

Step 4

Tried to install SciPy in another way. Downloaded the zip package scipy-0.7.1rc3.zip (direct download URL, 5506562 bytes), extracted the zip file in a normal way to a temporary directory, D:\temp7\scipy-0.7.1 (where setup.py and README.txt is). I then opened a command line window and:

d:
cd D:\temp7\scipy-0.7.1
setup.py install

This did not achieve much - here is a transcript (about 95 lines).

And it fails:

>>> import scipy as sp2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named scipy

Platform: Python 2.6.2 installed in directory D:\Python262, Windows XP 64 bit SP2, 8 GB RAM, Visual Studio 2008 Professional Edition installed.

The startup screen of the installed Python is:

Python 2.6.2 (r262:71605, Apr 14 2009, 22:46:50) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

Value of PATH, result from SET in a command line window:

Path=D:\Perl64\site\bin;D:\Perl64\bin;C:\Program Files (x86)\PC Connectivity Solution\;D:\Perl\site\bin;D:\Perl\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Program Files (x86)\ATI Technologies\ATI.ACE\Core-Static;d:\Program Files (x86)\WinSCP\;D:\MassLynx\;D:\Program Files (x86)\Analyst\bin;d:\Python262;d:\Python262\Scripts;D:\Program Files (x86)\TortoiseSVN\bin;D:\Program Files\TortoiseSVN\bin;C:\WINDOWS\system32\WindowsPowerShell\v1.0;D:\Program Files (x86)\IDM Computer Solutions\UltraEdit\

Source: (StackOverflow)

Advertisements

Fast tensor rotation with NumPy

At the heart of an application (written in Python and using NumPy) I need to rotate a 4th order tensor. Actually, I need to rotate a lot of tensors many times and this is my bottleneck. My naive implementation (below) involving eight nested loops seems to be quite slow, but I cannot see a way to leverage NumPy's matrix operations and, hopefully, speed things up. I've a feeling I should be using np.tensordot, but I don't see how.

Mathematically, elements of the rotated tensor, T', are given by: T'ijkl = Σ gia gjb gkc gld Tabcd with the sum being over the repeated indices on the right hand side. T and Tprime are 3*3*3*3 NumPy arrays and the rotation matrix g is a 3*3 NumPy array. My slow implementation (taking ~0.04 seconds per call) is below.

#!/usr/bin/env python

import numpy as np

def rotT(T, g):
    Tprime = np.zeros((3,3,3,3))
    for i in range(3):
        for j in range(3):
            for k in range(3):
                for l in range(3):
                    for ii in range(3):
                        for jj in range(3):
                            for kk in range(3):
                                for ll in range(3):
                                    gg = g[ii,i]*g[jj,j]*g[kk,k]*g[ll,l]
                                    Tprime[i,j,k,l] = Tprime[i,j,k,l] + \
                                         gg*T[ii,jj,kk,ll]
    return Tprime

if __name__ == "__main__":

    T = np.array([[[[  4.66533067e+01,  5.84985000e-02, -5.37671310e-01],
                    [  5.84985000e-02,  1.56722231e+01,  2.32831900e-02],
                    [ -5.37671310e-01,  2.32831900e-02,  1.33399259e+01]],
                   [[  4.60051700e-02,  1.54658176e+01,  2.19568200e-02],
                    [  1.54658176e+01, -5.18223500e-02, -1.52814920e-01],
                    [  2.19568200e-02, -1.52814920e-01, -2.43874100e-02]],
                   [[ -5.35577630e-01,  1.95558600e-02,  1.31108757e+01],
                    [  1.95558600e-02, -1.51342210e-01, -6.67615000e-03],
                    [  1.31108757e+01, -6.67615000e-03,  6.90486240e-01]]],
                  [[[  4.60051700e-02,  1.54658176e+01,  2.19568200e-02],
                    [  1.54658176e+01, -5.18223500e-02, -1.52814920e-01],
                    [  2.19568200e-02, -1.52814920e-01, -2.43874100e-02]],
                   [[  1.57414726e+01, -3.86167500e-02, -1.55971950e-01],
                    [ -3.86167500e-02,  4.65601977e+01, -3.57741000e-02],
                    [ -1.55971950e-01, -3.57741000e-02,  1.34215636e+01]],
                   [[  2.58256300e-02, -1.49072770e-01, -7.38843000e-03],
                    [ -1.49072770e-01, -3.63410500e-02,  1.32039847e+01],
                    [ -7.38843000e-03,  1.32039847e+01,  1.38172700e-02]]],
                  [[[ -5.35577630e-01,  1.95558600e-02,  1.31108757e+01],
                    [  1.95558600e-02, -1.51342210e-01, -6.67615000e-03],
                    [  1.31108757e+01, -6.67615000e-03,  6.90486240e-01]],
                   [[  2.58256300e-02, -1.49072770e-01, -7.38843000e-03],
                    [ -1.49072770e-01, -3.63410500e-02,  1.32039847e+01],
                    [ -7.38843000e-03,  1.32039847e+01,  1.38172700e-02]],
                   [[  1.33639532e+01, -1.26331100e-02,  6.84650400e-01],
                    [ -1.26331100e-02,  1.34222177e+01,  1.67851800e-02],
                    [  6.84650400e-01,  1.67851800e-02,  4.89151396e+01]]]])

    g = np.array([[ 0.79389393,  0.54184237,  0.27593346],
                  [-0.59925749,  0.62028664,  0.50609776],
                  [ 0.10306737, -0.56714313,  0.8171449 ]])

    for i in range(100):
        Tprime = rotT(T,g)

Is there a way to make this go faster? Making the code generalise to other ranks of tensor would be useful, but is less important.


Source: (StackOverflow)

Prevent anti-aliasing for imshow in matplotlib

When I use matplotlib's imshow() method to represent a small numpy matrix, it ends up doing some smoothing between pixels. Is there any way to disables this? It makes my figure's misleading in presentations.A 28x28 matrix plotted with imshow()

The figure above is a 28x28 image, so I should be seeing large squares of single colors representing each pixel (as matlab would display it when using imagesc()). But Instead, the pixels seem to be blurred with neighboring pixels. Is there a way to disable this behavior?


Source: (StackOverflow)

Read .mat files in Python

Does anyone have successful experience reading binary Matlab .mat files in Python?

(I've seen that scipy has alleged support for reading .mat files, but I'm unsuccessful with it. I installed scipy version 0.7.0, and I can't find the loadmat() method)


Source: (StackOverflow)

Relationship between scipy and numpy

scipy appears to provide most (but not all [1]) of numpy's functions in its own namespace. In other words, if there's a function named numpy.foo, there's almost certainly a scipy.foo. Most of the time, the two appear to be exactly the same, oftentimes even pointing to the same function object.

Sometimes, they're different. To give an example that came up recently:

  • numpy.log10 is a ufunc that returns NaNs for negative arguments;
  • scipy.log10 returns complex values for negative arguments, and doesn't appear to be a ufunc.

The same can be said about log, log2 and logn, but not about log1p [2].

On the other hand, numpy.exp and scipy.exp appear to be different names for the same ufunc. This is also true of scipy.log1p and numpy.log1p.

Another example is numpy.linalg.solve vs scipy.linalg.solve. They're similar, but the latter offers some additional features over the former.

Why the apparent duplication? If this is meant to be a wholesale import of numpy into the scipy namespace, why the subtle differences in behaviour and the missing functions? Is there some overarching logic that would help clear up the confusion?

[1] numpy.min, numpy.max, numpy.abs and a few others have no counterparts in the scipy namespace.

[2] Tested using numpy 1.5.1 and scipy 0.9.0rc2.


Source: (StackOverflow)

Installing SciPy with pip

It is possible to install NumPy with pip using pip install numpy.

Is there a similar possibility with SciPy? (Doing pip install scipy does not work.)

Update

The package SciPy is now available to be installed with pip, so the question is not relevant anymore.


Source: (StackOverflow)

How to read csv into record array in numpy?

I wonder if there is a direct way to import the contents of a csv file into a record array, much in the way that R's read.table(), read.delim(), and read.csv() family imports data to R's data frame? Or is the best way to use csv.reader() and then apply something like numpy.core.records.fromrecords()?


Source: (StackOverflow)

How to smooth a curve in the right way?

Lets assume we have a dataset which might be given approximately by

import numpy as np
x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.2

Therefore we have a variation of 20% of the dataset. My first idea was to use the UnivariateSpline function of scipy, but the problem is that this does not consider the small noise in a good way. If you consider the frequencies, the background is much smaller than the signal, so a spline only of the cutoff might be an idea, but that would involve a back and forth fourier transformation, which might result in bad behaviour. Another way would be a moving average, but this would also need the right choice of the delay.

Any hints/ books or links how to tackle this problem?

example


Source: (StackOverflow)

Invertible STFT and ISTFT in Python

Is there any general-purpose form of short-time Fourier transform with corresponding inverse transform built into SciPy or NumPy or whatever?

There's the pyplot specgram function in matplotlib, which calls ax.specgram(), which calls mlab.specgram(), which calls _spectral_helper():

#The checks for if y is x are so that we can use the same function to
#implement the core of psd(), csd(), and spectrogram() without doing
#extra calculations.  We return the unaveraged Pxy, freqs, and t.

but

This is a helper function that implements the commonality between the 204 #psd, csd, and spectrogram. It is NOT meant to be used outside of mlab

I'm not sure if this can be used to do an STFT and ISTFT, though. Is there anything else, or should I translate something like these MATLAB functions?

I know how to write my own ad-hoc implementation; I'm just looking for something full-featured, which can handle different windowing functions (but has a sane default), is fully invertible with COLA windows (istft(stft(x))==x), tested by multiple people, no off-by-one errors, handles the ends and zero padding well, fast RFFT implementation for real input, etc.


Source: (StackOverflow)

Python web hosting: Numpy, Matplotlib, Scientific Computing

I write scientific software in Numpy/Scipy/Matplotlib. Having developed applications on my home computer, I am now interested in writing simple web applications. Example: user uploads image or audio file, my program processes it using Numpy/Scipy, and output is displayed on the browser using Matplotlib, or perhaps the user can download a processed file.

I already pay for hosting that does have Python 2.4.3 installed, but no Numpy/Scipy. I don't have shell access via command line, either. Just drag-and-drop FTP. Pretty limited, but I can get simple Python/CGI scripts working.

Surprisingly, a web search revealed few suitable options for web hosting with these capabilities already built in. (Please guide me if I am wrong.) I am learning about the Google App Engine, but I still don't have a full understanding about its tools and limitations. What the web did tell me is that others have similar concerns.

Hoping for solutions, I thought I would ask these simple questions to the awesome SO community:

  1. Is there a simple way of installing numpy (or any third-party package/library) onto my already hosted space? I know the Python path on my hosted space, and I know the relevant Python/Numpy directories on my home computer. Can I simply copy files over and have it work? Both local and remote systems run Ubuntu.

  2. What hosting sites exist (either free or paid) which have Numpy/Matplotlib installed or, if not installed, the possibility of installing it? Are there any documented sites that you can reference with working applications, no matter how simple?

  3. Can Google App Engine help me in any way? Or is it totally for something else? Have you or others used it to write scientific applications in Python/Numpy? If so, could you reference them?

Thank you for your help.

EDIT: After the useful answers below, I bought the $20 plan at Slicehost, and I love it so far! (I first tried Amazon EC2. I must be stupid, but I just couldn't get it to work.) Setting up the Ubuntu server with Apache took mere hours (and I'm an Apache novice). It allows me to do exactly what I wanted with Python plus much more. I now have my own remote repository for version control, too. Thanks again!

EDIT 2: Nearly two years later, I tried Linode and EC2 (again). Linode is great. EC2 seemed easier this time around -- maybe it's just added experience, or maybe it's the improvements that Amazon made to the AWS management console. For those interested in Numpy/Scipy/Matplotlib/Audiolab, here is my Ubuntu cheat sheet whenever I launch an EC2 instance:

ec2:~$ sudo aptitude install build-essential python-scipy ipython 
       python-matplotlib python-dev python-setuptools libsndfile-dev 
       libasound2-dev mysql-server python-mysqldb 

Upload scikits.audiolab-0.11.0

ec2:~/scikits.audiolab-0.11.0$ sudo python setup.py install

ec2:~$ sudo rm -rf scikits.audiolab-0.11.0

ec2:~$ nano .ipython/ipy_user_conf.py

ip.ex('import matplotlib; matplotlib.use("Agg"); import scipy, pylab, 
       scipy.signal as sig, scipy.linalg as lin, scipy.sparse as spar, 
       os, sys, MySQLdb, boto; from scikits import audiolab')

import ipy_greedycompleter

import ipy_autoreload

Source: (StackOverflow)

How to normalize a NumPy array to within a certain range?

After doing some processing on an audio or image array, it needs to be normalized within a range before it can be written back to a file. This can be done like so:

# Normalize audio channels to between -1.0 and +1.0
audio[:,0] = audio[:,0]/abs(audio[:,0]).max()
audio[:,1] = audio[:,1]/abs(audio[:,1]).max()

# Normalize image to between 0 and 255
image = image/(image.max()/255.0)

Is there a less verbose, convenience function way to do this? matplotlib.colors.Normalize() doesn't seem to be related.


Source: (StackOverflow)

Multivariate linear regression in Python

I can't seem to find any python libraries that do multivariate regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against several independent variables (x1, x2, x3, etc.).

For example, with this data:

print 'y        x1      x2       x3       x4      x5     x6       x7'
for t in texts:
    print "{:>7.1f}{:>10.2f}{:>9.2f}{:>9.2f}{:>10.2f}{:>7.2f}{:>7.2f}{:>9.2f}" /
   .format(t.y,t.x1,t.x2,t.x3,t.x4,t.x5,t.x6,t.x7)

(output for above:)

      y        x1       x2       x3        x4     x5     x6       x7
   -6.0     -4.95    -5.87    -0.76     14.73   4.02   0.20     0.45
   -5.0     -4.55    -4.52    -0.71     13.74   4.47   0.16     0.50
  -10.0    -10.96   -11.64    -0.98     15.49   4.18   0.19     0.53
   -5.0     -1.08    -3.36     0.75     24.72   4.96   0.16     0.60
   -8.0     -6.52    -7.45    -0.86     16.59   4.29   0.10     0.48
   -3.0     -0.81    -2.36    -0.50     22.44   4.81   0.15     0.53
   -6.0     -7.01    -7.33    -0.33     13.93   4.32   0.21     0.50
   -8.0     -4.46    -7.65    -0.94     11.40   4.43   0.16     0.49
   -8.0    -11.54   -10.03    -1.03     18.18   4.28   0.21     0.55

How would I regress these in python, to get the linear regression formula:

Y = a1x1 + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + +a7x7 + c


Source: (StackOverflow)

Should I use `scipy.pi`, `numpy.pi`, or `math.pi`?

In a project using scipy and numpy, should I use scipy.pi, numpy.pi, or math.pi?

Ten minutes on google and Stack Overflow doesn't give an answer.


Source: (StackOverflow)

binning data in python with scipy/numpy

is there a more efficient way to take an average of an array in prespecified bins? for example, i have an array of numbers and an array corresponding to bin start and end positions in that array, and I want to just take the mean in those bins? I have code that does it below but i am wondering how it can be cut down and improved. thanks.

from scipy import *
from numpy import *

def get_bin_mean(a, b_start, b_end):
    ind_upper = nonzero(a >= b_start)[0]
    a_upper = a[ind_upper]
    a_range = a_upper[nonzero(a_upper < b_end)[0]]
    mean_val = mean(a_range)
    return mean_val


data = rand(100)
bins = linspace(0, 1, 10)
binned_data = []

n = 0
for n in range(0, len(bins)-1):
    b_start = bins[n]
    b_end = bins[n+1]
    binned_data.append(get_bin_mean(data, b_start, b_end))

print binned_data

Source: (StackOverflow)