EzDevInfo.com

zipline

Zipline, a Pythonic Algorithmic Trading Library

zipline error. No module named zipline

I installed zipline package via Enthought Cantopy. Now I try to run a script using it in command prompt, but get error ImportError: No module named zipline.

I also tried to run the same code using IPython, with the same output.

I think it is related to python virtual environments, but don't know how to fix that.


Source: (StackOverflow)

Unable to correctly load data from csv file into zipline

I was trying zipline out with my own csv file . I used the data that is downloaded in the default case from Yahoo and copied it into a csv file.

The file format which is : Date,Open,High,Low,Close,Volume,Adj Close 2012-01-03,409.399971,412.499989,408.999989,411.229973,75555200,54.934461

But , when I print the data being passed to handle_data , I get this :

{'volume': 1000, 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'price': 409.39997099999999, 'sid': 'Open'} {'volume': 1000, 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'price': 412.49998900000003, 'sid': 'High'} {'volume': 1000, 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'price': 408.99998900000003, 'sid': 'Low'} {'volume': 1000, 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'price': 411.22997299999997, 'sid': 'Close'} {'volume': 1000, 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'price': 75555200.0, 'sid': 'Volume'} {'volume': 1000, 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'price': 54.934460999999999, 'sid': 'Adj Close'} {'volume': 1000, 'dt': Timestamp('2012-01-04 00:00:00+0000', tz='UTC'), 'price': 410.00001099999997, 'sid': 'Open'} BarData({'Volume': SIDData({'volume': 1000, 'sid': 'Volume', 'source_id': 'DataFrameSource-7006398718743e03d6d00635b63c8e98', 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'type': 4, 'price': 75555200.0}), 'Adj Close': SIDData({'volume': 1000, 'sid': 'Adj Close', 'source_id': 'DataFrameSource-7006398718743e03d6d00635b63c8e98', 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'type': 4, 'price': 54.934461}), 'High': SIDData({'volume': 1000, 'sid': 'High', 'source_id': 'DataFrameSource-7006398718743e03d6d00635b63c8e98', 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'type': 4, 'price': 412.499989}), 'Low': SIDData({'volume': 1000, 'sid': 'Low', 'source_id': 'DataFrameSource-7006398718743e03d6d00635b63c8e98', 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'type': 4, 'price': 408.999989}), 'Close': SIDData({'volume': 1000, 'sid': 'Close', 'source_id': 'DataFrameSource-7006398718743e03d6d00635b63c8e98', 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'type': 4, 'price': 411.229973}), 'Open': SIDData({'volume': 1000, 'sid': 'Open', 'source_id': 'DataFrameSource-7006398718743e03d6d00635b63c8e98', 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'type': 4, 'price': 409.399971})})

After that I do a simple:

data = pd.read_csv('appleDataFromYahoo.csv', index_col='Date', parse_dates=True)
data.index = data.index.tz_localize(pytz.UTC)
data.head()
algo = TradingAlgorithm(initialize=initialize, handle_data=handle_data, capital_base=1000000)

results = algo.run(data)

The data I am getting in logs is obviously wrong because in the default case handle_data spit something like this :

BarData({'AAPL': SIDData({'high': 44.118027429170454, 'open': 43.50086143458919, 'price': 44.025853000000005, 'volume': 111284600, 'low': 43.393986829608814, 'sid': 'AAPL', 'source_id': 'DataPanelSource-714fe8ac9fdca11967199c1edefb9597', 'close': 44.025853000000005, 'dt': Timestamp('2011-01-03 00:00:00+0000', tz='UTC'), 'type': 4})})

Can someone please someone point into a direction of what I might be doing wrong . Is my file format wrong ? Do I need to provide another field specifying the symbol Id ? or something else ?


Source: (StackOverflow)

Advertisements

Bug of adfuller test with zipline

I've tested ADF with stattools, my code is

import statsmodels.tsa.stattools as ts  
from datetime import datetime

import zipline  
from zipline import TradingAlgorithm  
from zipline.api import order_target, record, symbol, history, add_history

data = get_pricing(['AAPL'],start_date='2015-01-13',end_date = '2015-02-13',frequency='daily')  
result = ts.adfuller(data.price.values,1)

However, I've got the following error,

ValueError                                Traceback (most recent call last)  
<ipython-input-45-44774ff05797> in <module>()  
      7  
      8 data = get_pricing(['AAPL'],start_date='2015-01-13',end_date = '2015-02-13',frequency='daily')  
----> 9 result = ts.adfuller(data.price.values,1)

/usr/local/lib/python2.7/dist-packages/statsmodels/tsa/stattools.pyc in adfuller(x, maxlag, regression, autolag, store, regresults)
    209  
    210     xdiff = np.diff(x)  
--> 211     xdall = lagmat(xdiff[:, None], maxlag, trim='both', original='in')  
    212     nobs = xdall.shape[0]  # pylint: disable=E1103  
    213 

/usr/local/lib/python2.7/dist-packages/statsmodels/tsa/tsatools.pyc in lagmat(x, maxlag, trim, original)
    322     if x.ndim == 1:  
    323         x = x[:,None]  
--> 324     nobs, nvar = x.shape  
    325     if original in ['ex','sep']:  
    326         dropidx = nvar

ValueError: too many values to unpack

I couldn't see the problem. I think the parameters of adfullter is ok.

Anyone could help?? Thanks a lot...


Source: (StackOverflow)

zipline backtesting with None_US(China) data either

i'm a new learner of zipline&python by myself.And I have learned a very simple trading stratgy,Now I want to backtest it.Several days ago I found zipline,but the data structure is different with US.the datstructure is bellow(stock 'a'):

                  open    high  low  close   volume  ma_5  ma_10 (etc)
   index
   2015-01-01      1.5    2.6   1.2   2.0     1000   1.2   1.3
   2015-01-03      1.5     2.6  1.2   2.0     1000   1.2   1.3

and I just want run a very simple example as following code:

from datetime import datetime

from zipline.algorithm import TradingAlgorithm
from zipline.api import order, record, symbol

import pandas as pd
import data as dt

def initialize(context):
    pass

def handle_data(context, data):
    order('close', 10)
    #the dataFrame does not contain price,and how to fix AAPL?
    record(AAPL=data['close'].price)   

dat = dt.get_data('600848')   #our countries stock code,could get the dataFrame structure data above
algo_obj = TradingAlgorithm(initialize=initialize, 
                            handle_data=handle_data)

# Run algorithm
perf_manual = algo_obj.run(dat)

I knew there are too many errors in the code,and I have read the same like question with Europe proposed by the other one.but it may be complex for me now.Could anyone show me how many works should done to fix the code or I may find another package more fixable to china?
the object for me is order code like600848 at it close price on day 2015-01-01


Source: (StackOverflow)

backtest with local data in zipline

I am using zipline to backtest with the local data, but it seems unsuccessful.
from datetime import datetime import pytz import pandas as pd

from zipline.algorithm import TradingAlgorithm


import zipline.utils.factory as factory


class BuyApple(TradingAlgorithm):

    def handle_data(self, data):
        self.order('AAPL', 1)


if __name__ == '__main__':


    data = pd.read_csv('AAPL.csv')



    simple_algo = BuyApple()

    results = simple_algo.run(data)

above is my code, When I run this script, I got the message:

[2015-04-03 01:41:53.712035] WARNING: Loader: No benchmark data found for date range.
start_date=2015-04-03 00:00:00+00:00, end_date=2015-04-03 01:41:53.632300, url=http://ichart.finance.yahoo.com/table.csv?a=3&c=2015&b=3&e=3&d=3&g=d&f=2015&s=%5EGSPC
Traceback (most recent call last):
  File "bollinger.py", line 31, in <module>
    results = simple_algo.run(data)
  File "/home/xinzhou/.local/lib/python2.7/site-packages/zipline-0.7.0-py2.7.egg/zipline/algorithm.py", line 372, in run
    source = DataFrameSource(source)
  File "/home/xinzhou/.local/lib/python2.7/site-packages/zipline-0.7.0-py2.7.egg/zipline/sources/data_frame_source.py", line 42, in __init__
    assert isinstance(data.index, pd.tseries.index.DatetimeIndex)
AssertionError

Then I change my code to below:

from datetime import datetime
import pytz
import pandas as pd

from zipline.algorithm import TradingAlgorithm


import zipline.utils.factory as factory


class BuyApple(TradingAlgorithm):

    def handle_data(self, data):
        self.order('AAPL', 1)


if __name__ == '__main__':
    start = datetime(2000, 1, 9, 14, 30, 0, 0, pytz.utc)

    end = datetime(2001, 1, 10, 21, 0, 0, 0, pytz.utc)

    data = pd.read_csv('AAPL.csv', parse_dates=True, index_col=0)

    sim_params = factory.create_simulation_parameters(
       start=start, end=end, capital_base=10000)
    sim_params.data_frequency = '1d'
    sim_params.emission_rate = '1d'

    simple_algo = BuyApple()

    results = simple_algo.run(data)

The

assert isinstance(data.index, pd.tseries.index.DatetimeIndex)
AssertionError

is gone. But in my terminal, it keeps in this message:

[2015-04-03 01:44:28.141657] WARNING: Loader: No benchmark data found for date range.
start_date=2015-04-03 00:00:00+00:00, end_date=2015-04-03 01:44:28.028243, url=http://ichart.finance.yahoo.com/table.csv?a=3&c=2015&b=3&e=3&d=3&g=d&f=2015&s=%5EGSPC

How to solve this problem? Thanks.


Source: (StackOverflow)

Pandas column A > B, execute trade with Zipline

I have a problem trying to backtest a financial idea I've got with Zipline.

What I've got is closing price for SPY in the first column data['SPY'], and a "homemade" signal data['Signal']. The signal works similar to a moving average, and is either below or above data['SPY'].

data = pd.read_csv('data.csv', index_col='Date', parse_dates=True)

Now the trading logic is not working, since if I understand it correctly, pandas does not make computations by rows but instead computes on the whole dataset at once.

    def handle_data(self, data):
    if data['SPY'] - data['Signal'] > 0:
        self.order('SPY', 1)
    else:
        self.order('SPY', -1)

The non-working code above express the logic that I want to perform -- if SPY > Signal then buy, sell if the opposite is true. I've tried all kinds of iterations but just can't make it work.

Anyone that can help as struggling amateur trader out?

#

I managed to get a bit further. What I needed to do was to add a "transform" to my dataframe, in this case a Moving Average. Since I don't want to actually use a moving average, I set the number of days to 1, which in effect should mean that I use the original numbers.

class Test(TradingAlgorithm):
def initialize(self):
    self.add_transform(MovingAverage, 'dummy', ['price'], window_length=1) # The moving average that I don't really want to utilize.
    self.pos_long = False
    self.pos_short = False

def handle_data(self, data):
    if data['SPY'].dummy['price'] >= data['Signal'].dummy['price'] and not self.pos_long:
        self.order('SPY', 100)
        self.pos_long = True
        self.pos_short = False
    elif data['SPY'].dummy['price'] <= data['Signal'].dummy['price'] and not self.pos_short:
        self.order('SPY', -100)
        self.pos_long = False
        self.pos_short = True

The above code works almost exactly as I'd like, the only quirk is that for some reason it doesn't take any short positions. But it's at least a step in the right direction, and might help others out.

These two links below helped me figure this bit out.

http://zipline.readthedocs.org/en/latest/zipline.transforms.html

http://nbviewer.ipython.org/github/twiecki/financial-analysis-python-tutorial/blob/master/3.%20Backtesting%20using%20Zipline.ipynb


Source: (StackOverflow)

Any way to get 1 minute data using Zipline

I am using Zipline in an iPython Notebook to back test. However, I am fairly new to the library and was wondering if their was anyway to add 1 minute data. Currently I am able to receive 1 day open, high, low, close, etc. using the following code:

start = datetime(2015, 1, 1, 0, 0, 0, 0, pytz.utc)
end = datetime(2015,6,30,0,0,0,0, pytz.utc)

data = load_bars_from_yahoo(stocks=["AAPL"], start=start, end=end); data.save('talk_px.dat')

Is there anyway i could change the frequency from 1 day to 1 minute?


Source: (StackOverflow)

Zipline - csv file

I am trying to figure out how to use my own csv datafiles (originally from Yahoo finance) to be used within Zipline. I know you need to load the csv file into a pandas dataframe. But I can't seem to stop Zipline from downloading the data from Yahoo.

My csv file format:

Date, Open, High, Low, Close, Volume, AdjClose

My algofile:

from zipline.api import order, record, symbol
import pandas as pd

data = pd.read_csv('AAPL.csv')

def initialize(context):
    pass

def handle_data(context, data):
    order(symbol('AAPL'), 10)
    record(AAPL=data[symbol('AAPL')].price)

My command line to create the pickle file:

run_algo.py -f E:\..\Main.py --start 2011-1-1 --end 2015-1-1 -o buyapple_out.pickle

Command line output:

[2015-03-27 10:18:20.809959] WARNING: Loader: No benchmark data found for date range.
start_date=2015-03-27 00:00:00+00:00, end_date=2015-03-27 10:18:19.973911, url=http://ichart.finance.yahoo.com/table.csv?a=2&s=%5EGSPC&b=27&e=27&d=2&g
=d&f=2015&c=2015
[2015-03-27 10:20:05.811965] INFO: Performance: Simulated 504 trading days out of 504.
[2015-03-27 10:20:05.811965] INFO: Performance: first open: 2013-01-02 14:31:00+00:00
[2015-03-27 10:20:05.811965] INFO: Performance: last close: 2014-12-31 21:00:00+00:00

My pickle file is created correctly. But it appears to still be using yahoo instead of my csv because the command line output talks about yahoo finance. There seems to be no documentation from Zipline on how to do this, other than 'load the csv into a dataframe'. What else?

Many thanks.


Source: (StackOverflow)

Get last date in each month of a time series pandas

Currently I'm generating a DateTimeIndex using a certain function, zipline.utils.tradingcalendar.get_trading_days. The time series is roughly daily but with some gaps.

My goal is to get the last date in the DateTimeIndex for each month.

.to_period('M') & .to_timestamp('M') don't work since they give the last day of the month rather than the last value of the variable in each month.

As an example, if this is my time series I would want to select '2015-05-29' while the last day of the month is '2015-05-31'.

['2015-05-18', '2015-05-19', '2015-05-20', '2015-05-21', '2015-05-22', '2015-05-26', '2015-05-27', '2015-05-28', '2015-05-29', '2015-06-01']


Source: (StackOverflow)

zipline backtesting using non-US (European) intraday data

I'm trying to get zipline working with non-US, intraday data, that I've loaded into a pandas DataFrame:

                        BARC    HSBA     LLOY     STAN
Date                                                  
2014-07-01 08:30:00  321.250  894.55  112.105  1777.25
2014-07-01 08:32:00  321.150  894.70  112.095  1777.00
2014-07-01 08:34:00  321.075  894.80  112.140  1776.50
2014-07-01 08:36:00  321.725  894.80  112.255  1777.00
2014-07-01 08:38:00  321.675  894.70  112.290  1777.00

I've followed moving-averages tutorial here, replacing "AAPL" with my own symbol code, and the historical calls with "1m" data instead of "1d".

Then I do the final call using algo_obj.run(DataFrameSource(mydf)), where mydf is the dataframe above.

However there are all sorts of problems arising related to TradingEnvironment. According to the source code:

# This module maintains a global variable, environment, which is
# subsequently referenced directly by zipline financial
# components. To set the environment, you can set the property on
# the module directly:
#       from zipline.finance import trading
#       trading.environment = TradingEnvironment()
#
# or if you want to switch the environment for a limited context
# you can use a TradingEnvironment in a with clause:
#       lse = TradingEnvironment(bm_index="^FTSE", exchange_tz="Europe/London")
#       with lse:
# the code here will have lse as the global trading.environment
#           algo.run(start, end)

However, using the context doesn't seem to fully work. I still get errors, for example stating that my timestamps are before the market open (and indeed, looking at trading.environment.open_and_close the times are for the US market.

My question is, has anybody managed to use zipline with non-US, intra-day data? Could you point me to a resource and ideally example code on how to do this?

n.b. I've seen there are some tests on github that seem related to the trading calendars (tradincalendar_lse.py, tradingcalendar_tse.py , etc) - but this appears to only handle data at the daily level. I would need to fix:

  • open/close times
  • reference data for the benchmark
  • and probably more ...

Source: (StackOverflow)

What does the @ symbol do in iPython/Python [duplicate]

This question already has an answer here:

A code that I'm reading uses @batch_transform. What does the @ symbol do? Is it ipython specific?

from zipline.transforms import batch_transform
from scipy import stats

@batch_transform
def regression_transform(data):
    pep_price = data.price['PEP']
    ko_price = data.price['KO']
    slope, intercept, _, _, _ = stats.linregress(pep_price, ko_price)

    return intercept, slope

Source: (StackOverflow)

Python: KeyError 'shift'

I am new to Python and try to modify a pair trading script that I found here: https://github.com/quantopian/zipline/blob/master/zipline/examples/pairtrade.py

The original script is designed to use only prices. I would like to use returns to fit my models and price for invested quantity but I don't see how do it.

I have tried:

  • to define a data frame of returns in the main and call it in run
  • to define a data frame of returns in the main as a global object and use where needed in the 'handle data'
  • to define a data frame of retuns directly in the handle data

I assume the last option to be the most appropriate but then I have an error with panda 'shift' attribute.

More specifically I try to define 'DataRegression' as follow:

DataRegression = data.copy()
DataRegression[Stock1]=DataRegression[Stock1]/DataRegression[Stock1].shift(1)-1
DataRegression[Stock2]=DataRegression[Stock2]/DataRegression[Stock2].shift(1)-1
DataRegression[Stock3]=DataRegression[Stock3]/DataRegression[Stock3].shift(1)-1
DataRegression = DataRegression.dropna(axis=0)

where 'data' is a data frame which contains prices, stock1, stock2 and stock3 column names defined globally. Those lines in the handle data return the error:

File "A:\Apps\Python\Python.2.7.3.x86\lib\site-packages\zipline-0.5.6-py2.7.egg\zipline\utils\protocol_utils.py", line 85, in __getattr__
return self.__internal[key]
KeyError: 'shift'

Would anyone know why and how to do that correctly?

Many Thanks, Vincent


Source: (StackOverflow)

Transforming financial data from postgres to pandas dataframe for use with Zipline

I'm new to Pandas and Zipline, and I'm trying to learn how to use them (and use them with this data that I have). Any sorts of tips, even if no full solution, would be much appreciated. I have tried a number of things, and have gotten quite close, but run into indexing issues, Exception: Reindexing only valid with uniquely valued Index objects, in particular. [Pandas 0.10.0, Python 2.7]

I'm trying to transform monthly returns data I have for thousands of stocks in postgres from the form:

ticker_symbol :: String, monthly_return :: Float, date :: Timestamp

e.g.

AAPL, 0.112, 28/2/1992
GS, 0.13, 30/11/1981
GS, -0.23, 22/12/1981

NB: The frequency of the reporting is monthly, but there is going to be considerable NaN data here, as not all of the over 6000 companies I have here are going to be around at the same time.

…to the form described below, which is what Zipline needs to run its backtester. (I think. Can Zipline's backtester work with monthly data like this, easily? I know it can, but any tips for doing this?)


The below is a DataFrame (of timeseries? How do you say this?), in the format I need:

> data:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2268 entries, 1993-01-04 00:00:00+00:00 to 2001-12-31 00:00:00+00:00
Data columns:
AA      2268  non-null values
AAPL    2268  non-null values
GE      2268  non-null values
IBM     2268  non-null values
JNJ     2268  non-null values
KO      2268  non-null values
MSFT    2268  non-null values
PEP     2268  non-null values
SPX     2268  non-null values
XOM     2268  non-null values
dtypes: float64(10)

The below is a TimeSeries, and is in the format I need.

> data.AAPL:

Date
1993-01-04 00:00:00+00:00    73.00
1993-01-05 00:00:00+00:00    73.12
...

2001-12-28 00:00:00+00:00    36.15
2001-12-31 00:00:00+00:00    35.55
Name: AAPL, Length: 2268

Note, there isn't return data here, but prices instead. They're adjusted (by Zipline's load_from_yahoo—though, from reading the source, really by functions in pandas) for dividends, splits, etc, so there's an isomorphism (less the initial price) between that and my return data (so, no problem here).

(EDIT: Let me know if you'd like me to write what I have, or attach my iPython notebook or a gist; I just doubt it'd be helpful, but I can absolutely do it if requested.)


Source: (StackOverflow)

Converting a pandas MultiIndex DataFrame from rows-wise to column-wise

I'm working in zipline and pandas and have converted a pandas.Panel to a pandas.DataFrame using the to_frame() method. This is the resulting pandas.DataFrame which as you can see is multi-indexed:

                                  price
major                     minor                
2008-01-03 00:00:00+00:00 SPY    129.93
                          KO      26.38
                          PEP     64.78
2008-01-04 00:00:00+00:00 SPY    126.74
                          KO      26.43
                          PEP     64.59
2008-01-07 00:00:00+00:00 SPY    126.63
                          KO      27.05
                          PEP     66.10
2008-01-08 00:00:00+00:00 SPY    124.59
                          KO      27.16
                          PEP     66.63

I need to convert this frame to look like this:

                          SPY     KO     PEP
2008-01-03 00:00:00+00:00 129.93  26.38  64.78
2008-01-04 00:00:00+00:00 126.74  26.43  64.59
2008-01-07 00:00:00+00:00 126.63  27.05  66.10
2008-01-08 00:00:00+00:00 124.59  27.16  66.63

I've tried the pivot method, stack/unstack, etc. but these methods are not what I'm looking for. I'm really quite stuck at this point and any help is appreciated.


Source: (StackOverflow)

pip install gives error: Unable to find vcvarsall.bat

Using pip install zipline on Windows 8 with Python 2.7 gives me the error:

Downloading/unpacking six (from python-dateutil==2.1->delorean->zipline[all])
  Running setup.py egg_info for package six

Installing collected packages: blist, pytz, requests, python-dateutil, six
  Running setup.py install for blist
    building '_blist' extension
    error: Unable to find vcvarsall.bat
    Complete output from command C:\Python27\python.exe -c "import setuptools;__
file__='c:\\users\\ThatsMe\\appdata\\local\\temp\\pip-build-ThatsMe\\blist\\setup.py';ex
ec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" inst
all --record c:\users\ThatsMe\appdata\local\temp\pip-xvoky2-record\install-record.tx
t --single-version-externally-managed:

running install

running build

running build_py

running build_ext

building '_blist' extension

error: Unable to find vcvarsall.bat

Question: How can the error be resolved? Running pip install zipline[all] gives the same error...


Source: (StackOverflow)