zipline
Zipline, a Pythonic Algorithmic Trading Library
I installed zipline package via Enthought Cantopy. Now I try to run a script using it in command prompt, but get error ImportError: No module named zipline
.
I also tried to run the same code using IPython, with the same output.
I think it is related to python virtual environments, but don't know how to fix that.
Source: (StackOverflow)
I was trying zipline out with my own csv file . I used the data that is downloaded in the default case from Yahoo and copied it into a csv file.
The file format which is :
Date,Open,High,Low,Close,Volume,Adj Close
2012-01-03,409.399971,412.499989,408.999989,411.229973,75555200,54.934461
But , when I print the data being passed to handle_data , I get this :
{'volume': 1000, 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'price': 409.39997099999999, 'sid': 'Open'}
{'volume': 1000, 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'price': 412.49998900000003, 'sid': 'High'}
{'volume': 1000, 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'price': 408.99998900000003, 'sid': 'Low'}
{'volume': 1000, 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'price': 411.22997299999997, 'sid': 'Close'}
{'volume': 1000, 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'price': 75555200.0, 'sid': 'Volume'}
{'volume': 1000, 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'price': 54.934460999999999, 'sid': 'Adj Close'}
{'volume': 1000, 'dt': Timestamp('2012-01-04 00:00:00+0000', tz='UTC'), 'price': 410.00001099999997, 'sid': 'Open'}
BarData({'Volume': SIDData({'volume': 1000, 'sid': 'Volume', 'source_id': 'DataFrameSource-7006398718743e03d6d00635b63c8e98', 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'type': 4, 'price': 75555200.0}), 'Adj Close': SIDData({'volume': 1000, 'sid': 'Adj Close', 'source_id': 'DataFrameSource-7006398718743e03d6d00635b63c8e98', 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'type': 4, 'price': 54.934461}), 'High': SIDData({'volume': 1000, 'sid': 'High', 'source_id': 'DataFrameSource-7006398718743e03d6d00635b63c8e98', 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'type': 4, 'price': 412.499989}), 'Low': SIDData({'volume': 1000, 'sid': 'Low', 'source_id': 'DataFrameSource-7006398718743e03d6d00635b63c8e98', 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'type': 4, 'price': 408.999989}), 'Close': SIDData({'volume': 1000, 'sid': 'Close', 'source_id': 'DataFrameSource-7006398718743e03d6d00635b63c8e98', 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'type': 4, 'price': 411.229973}), 'Open': SIDData({'volume': 1000, 'sid': 'Open', 'source_id': 'DataFrameSource-7006398718743e03d6d00635b63c8e98', 'dt': Timestamp('2012-01-03 00:00:00+0000', tz='UTC'), 'type': 4, 'price': 409.399971})})
After that I do a simple:
data = pd.read_csv('appleDataFromYahoo.csv', index_col='Date', parse_dates=True)
data.index = data.index.tz_localize(pytz.UTC)
data.head()
algo = TradingAlgorithm(initialize=initialize, handle_data=handle_data, capital_base=1000000)
results = algo.run(data)
The data I am getting in logs is obviously wrong because in the default case handle_data spit something like this :
BarData({'AAPL': SIDData({'high': 44.118027429170454, 'open': 43.50086143458919, 'price': 44.025853000000005, 'volume': 111284600, 'low': 43.393986829608814, 'sid': 'AAPL', 'source_id': 'DataPanelSource-714fe8ac9fdca11967199c1edefb9597', 'close': 44.025853000000005, 'dt': Timestamp('2011-01-03 00:00:00+0000', tz='UTC'), 'type': 4})})
Can someone please someone point into a direction of what I might be doing wrong . Is my file format wrong ? Do I need to provide another field specifying the symbol Id ? or something else ?
Source: (StackOverflow)
I've tested ADF with stattools, my code is
import statsmodels.tsa.stattools as ts
from datetime import datetime
import zipline
from zipline import TradingAlgorithm
from zipline.api import order_target, record, symbol, history, add_history
data = get_pricing(['AAPL'],start_date='2015-01-13',end_date = '2015-02-13',frequency='daily')
result = ts.adfuller(data.price.values,1)
However, I've got the following error,
ValueError Traceback (most recent call last)
<ipython-input-45-44774ff05797> in <module>()
7
8 data = get_pricing(['AAPL'],start_date='2015-01-13',end_date = '2015-02-13',frequency='daily')
----> 9 result = ts.adfuller(data.price.values,1)
/usr/local/lib/python2.7/dist-packages/statsmodels/tsa/stattools.pyc in adfuller(x, maxlag, regression, autolag, store, regresults)
209
210 xdiff = np.diff(x)
--> 211 xdall = lagmat(xdiff[:, None], maxlag, trim='both', original='in')
212 nobs = xdall.shape[0] # pylint: disable=E1103
213
/usr/local/lib/python2.7/dist-packages/statsmodels/tsa/tsatools.pyc in lagmat(x, maxlag, trim, original)
322 if x.ndim == 1:
323 x = x[:,None]
--> 324 nobs, nvar = x.shape
325 if original in ['ex','sep']:
326 dropidx = nvar
ValueError: too many values to unpack
I couldn't see the problem. I think the parameters of adfullter is ok.
Anyone could help?? Thanks a lot...
Source: (StackOverflow)
i'm a new learner of zipline
&python
by myself.And I have learned a very simple trading stratgy,Now I want to backtest it.Several days ago I found zipline
,but the data structure is different with US.the dat
structure is bellow(stock 'a'
):
open high low close volume ma_5 ma_10 (etc)
index
2015-01-01 1.5 2.6 1.2 2.0 1000 1.2 1.3
2015-01-03 1.5 2.6 1.2 2.0 1000 1.2 1.3
and I just want run a very simple example as following code:
from datetime import datetime
from zipline.algorithm import TradingAlgorithm
from zipline.api import order, record, symbol
import pandas as pd
import data as dt
def initialize(context):
pass
def handle_data(context, data):
order('close', 10)
#the dataFrame does not contain price,and how to fix AAPL?
record(AAPL=data['close'].price)
dat = dt.get_data('600848') #our countries stock code,could get the dataFrame structure data above
algo_obj = TradingAlgorithm(initialize=initialize,
handle_data=handle_data)
# Run algorithm
perf_manual = algo_obj.run(dat)
I knew there are too many errors in the code,and I have read the same like question with Europe proposed by the other one.but it may be complex for me now.Could anyone show me how many works should done to fix the code or I may find another package more fixable to china?
the object for me is order code like600848
at it close price
on day 2015-01-01
Source: (StackOverflow)
I am using zipline to backtest with the local data, but it seems unsuccessful.
from datetime import datetime
import pytz
import pandas as pd
from zipline.algorithm import TradingAlgorithm
import zipline.utils.factory as factory
class BuyApple(TradingAlgorithm):
def handle_data(self, data):
self.order('AAPL', 1)
if __name__ == '__main__':
data = pd.read_csv('AAPL.csv')
simple_algo = BuyApple()
results = simple_algo.run(data)
above is my code, When I run this script, I got the message:
[2015-04-03 01:41:53.712035] WARNING: Loader: No benchmark data found for date range.
start_date=2015-04-03 00:00:00+00:00, end_date=2015-04-03 01:41:53.632300, url=http://ichart.finance.yahoo.com/table.csv?a=3&c=2015&b=3&e=3&d=3&g=d&f=2015&s=%5EGSPC
Traceback (most recent call last):
File "bollinger.py", line 31, in <module>
results = simple_algo.run(data)
File "/home/xinzhou/.local/lib/python2.7/site-packages/zipline-0.7.0-py2.7.egg/zipline/algorithm.py", line 372, in run
source = DataFrameSource(source)
File "/home/xinzhou/.local/lib/python2.7/site-packages/zipline-0.7.0-py2.7.egg/zipline/sources/data_frame_source.py", line 42, in __init__
assert isinstance(data.index, pd.tseries.index.DatetimeIndex)
AssertionError
Then I change my code to below:
from datetime import datetime
import pytz
import pandas as pd
from zipline.algorithm import TradingAlgorithm
import zipline.utils.factory as factory
class BuyApple(TradingAlgorithm):
def handle_data(self, data):
self.order('AAPL', 1)
if __name__ == '__main__':
start = datetime(2000, 1, 9, 14, 30, 0, 0, pytz.utc)
end = datetime(2001, 1, 10, 21, 0, 0, 0, pytz.utc)
data = pd.read_csv('AAPL.csv', parse_dates=True, index_col=0)
sim_params = factory.create_simulation_parameters(
start=start, end=end, capital_base=10000)
sim_params.data_frequency = '1d'
sim_params.emission_rate = '1d'
simple_algo = BuyApple()
results = simple_algo.run(data)
The
assert isinstance(data.index, pd.tseries.index.DatetimeIndex)
AssertionError
is gone. But in my terminal, it keeps in this message:
[2015-04-03 01:44:28.141657] WARNING: Loader: No benchmark data found for date range.
start_date=2015-04-03 00:00:00+00:00, end_date=2015-04-03 01:44:28.028243, url=http://ichart.finance.yahoo.com/table.csv?a=3&c=2015&b=3&e=3&d=3&g=d&f=2015&s=%5EGSPC
How to solve this problem? Thanks.
Source: (StackOverflow)
I have a problem trying to backtest a financial idea I've got with Zipline.
What I've got is closing price for SPY in the first column data['SPY'], and a "homemade" signal data['Signal']. The signal works similar to a moving average, and is either below or above data['SPY'].
data = pd.read_csv('data.csv', index_col='Date', parse_dates=True)
Now the trading logic is not working, since if I understand it correctly, pandas does not make computations by rows but instead computes on the whole dataset at once.
def handle_data(self, data):
if data['SPY'] - data['Signal'] > 0:
self.order('SPY', 1)
else:
self.order('SPY', -1)
The non-working code above express the logic that I want to perform -- if SPY > Signal then buy, sell if the opposite is true. I've tried all kinds of iterations but just can't make it work.
Anyone that can help as struggling amateur trader out?
#
I managed to get a bit further. What I needed to do was to add a "transform" to my dataframe, in this case a Moving Average. Since I don't want to actually use a moving average, I set the number of days to 1, which in effect should mean that I use the original numbers.
class Test(TradingAlgorithm):
def initialize(self):
self.add_transform(MovingAverage, 'dummy', ['price'], window_length=1) # The moving average that I don't really want to utilize.
self.pos_long = False
self.pos_short = False
def handle_data(self, data):
if data['SPY'].dummy['price'] >= data['Signal'].dummy['price'] and not self.pos_long:
self.order('SPY', 100)
self.pos_long = True
self.pos_short = False
elif data['SPY'].dummy['price'] <= data['Signal'].dummy['price'] and not self.pos_short:
self.order('SPY', -100)
self.pos_long = False
self.pos_short = True
The above code works almost exactly as I'd like, the only quirk is that for some reason it doesn't take any short positions. But it's at least a step in the right direction, and might help others out.
These two links below helped me figure this bit out.
http://zipline.readthedocs.org/en/latest/zipline.transforms.html
http://nbviewer.ipython.org/github/twiecki/financial-analysis-python-tutorial/blob/master/3.%20Backtesting%20using%20Zipline.ipynb
Source: (StackOverflow)
I am using Zipline in an iPython Notebook to back test. However, I am fairly new to the library and was wondering if their was anyway to add 1 minute data. Currently I am able to receive 1 day open, high, low, close, etc. using the following code:
start = datetime(2015, 1, 1, 0, 0, 0, 0, pytz.utc)
end = datetime(2015,6,30,0,0,0,0, pytz.utc)
data = load_bars_from_yahoo(stocks=["AAPL"], start=start, end=end); data.save('talk_px.dat')
Is there anyway i could change the frequency from 1 day to 1 minute?
Source: (StackOverflow)
I am trying to figure out how to use my own csv datafiles (originally from Yahoo finance) to be used within Zipline. I know you need to load the csv file into a pandas dataframe. But I can't seem to stop Zipline from downloading the data from Yahoo.
My csv file format:
Date, Open, High, Low, Close, Volume, AdjClose
My algofile:
from zipline.api import order, record, symbol
import pandas as pd
data = pd.read_csv('AAPL.csv')
def initialize(context):
pass
def handle_data(context, data):
order(symbol('AAPL'), 10)
record(AAPL=data[symbol('AAPL')].price)
My command line to create the pickle file:
run_algo.py -f E:\..\Main.py --start 2011-1-1 --end 2015-1-1 -o buyapple_out.pickle
Command line output:
[2015-03-27 10:18:20.809959] WARNING: Loader: No benchmark data found for date range.
start_date=2015-03-27 00:00:00+00:00, end_date=2015-03-27 10:18:19.973911, url=http://ichart.finance.yahoo.com/table.csv?a=2&s=%5EGSPC&b=27&e=27&d=2&g
=d&f=2015&c=2015
[2015-03-27 10:20:05.811965] INFO: Performance: Simulated 504 trading days out of 504.
[2015-03-27 10:20:05.811965] INFO: Performance: first open: 2013-01-02 14:31:00+00:00
[2015-03-27 10:20:05.811965] INFO: Performance: last close: 2014-12-31 21:00:00+00:00
My pickle file is created correctly. But it appears to still be using yahoo instead of my csv because the command line output talks about yahoo finance.
There seems to be no documentation from Zipline on how to do this, other than 'load the csv into a dataframe'. What else?
Many thanks.
Source: (StackOverflow)
Currently I'm generating a DateTimeIndex using a certain function, zipline.utils.tradingcalendar.get_trading_days
. The time series is roughly daily but with some gaps.
My goal is to get the last date in the DateTimeIndex
for each month.
.to_period('M')
& .to_timestamp('M')
don't work since they give the last day of the month rather than the last value of the variable in each month.
As an example, if this is my time series I would want to select '2015-05-29' while the last day of the month is '2015-05-31'.
['2015-05-18', '2015-05-19', '2015-05-20', '2015-05-21',
'2015-05-22', '2015-05-26', '2015-05-27', '2015-05-28',
'2015-05-29', '2015-06-01']
Source: (StackOverflow)
I'm trying to get zipline working with non-US, intraday data, that I've loaded into a pandas DataFrame:
BARC HSBA LLOY STAN
Date
2014-07-01 08:30:00 321.250 894.55 112.105 1777.25
2014-07-01 08:32:00 321.150 894.70 112.095 1777.00
2014-07-01 08:34:00 321.075 894.80 112.140 1776.50
2014-07-01 08:36:00 321.725 894.80 112.255 1777.00
2014-07-01 08:38:00 321.675 894.70 112.290 1777.00
I've followed moving-averages tutorial here, replacing "AAPL" with my own symbol code, and the historical calls with "1m" data instead of "1d".
Then I do the final call using algo_obj.run(DataFrameSource(mydf))
, where mydf
is the dataframe above.
However there are all sorts of problems arising related to TradingEnvironment. According to the source code:
# This module maintains a global variable, environment, which is
# subsequently referenced directly by zipline financial
# components. To set the environment, you can set the property on
# the module directly:
# from zipline.finance import trading
# trading.environment = TradingEnvironment()
#
# or if you want to switch the environment for a limited context
# you can use a TradingEnvironment in a with clause:
# lse = TradingEnvironment(bm_index="^FTSE", exchange_tz="Europe/London")
# with lse:
# the code here will have lse as the global trading.environment
# algo.run(start, end)
However, using the context doesn't seem to fully work. I still get errors, for example stating that my timestamps are before the market open (and indeed, looking at trading.environment.open_and_close
the times are for the US market.
My question is, has anybody managed to use zipline with non-US, intra-day data? Could you point me to a resource and ideally example code on how to do this?
n.b. I've seen there are some tests on github that seem related to the trading calendars (tradincalendar_lse.py, tradingcalendar_tse.py , etc) - but this appears to only handle data at the daily level. I would need to fix:
- open/close times
- reference data for the benchmark
- and probably more ...
Source: (StackOverflow)
This question already has an answer here:
A code that I'm reading uses @batch_transform
. What does the @
symbol do? Is it ipython specific?
from zipline.transforms import batch_transform
from scipy import stats
@batch_transform
def regression_transform(data):
pep_price = data.price['PEP']
ko_price = data.price['KO']
slope, intercept, _, _, _ = stats.linregress(pep_price, ko_price)
return intercept, slope
Source: (StackOverflow)
I am new to Python and try to modify a pair trading script that I found here:
https://github.com/quantopian/zipline/blob/master/zipline/examples/pairtrade.py
The original script is designed to use only prices. I would like to use returns to fit my models and price for invested quantity but I don't see how do it.
I have tried:
- to define a data frame of returns in the main and call it in run
- to define a data frame of returns in the main as a global object and use where needed in the 'handle data'
- to define a data frame of retuns directly in the handle data
I assume the last option to be the most appropriate but then I have an error with panda 'shift' attribute.
More specifically I try to define 'DataRegression' as follow:
DataRegression = data.copy()
DataRegression[Stock1]=DataRegression[Stock1]/DataRegression[Stock1].shift(1)-1
DataRegression[Stock2]=DataRegression[Stock2]/DataRegression[Stock2].shift(1)-1
DataRegression[Stock3]=DataRegression[Stock3]/DataRegression[Stock3].shift(1)-1
DataRegression = DataRegression.dropna(axis=0)
where 'data' is a data frame which contains prices, stock1, stock2 and stock3 column names defined globally. Those lines in the handle data return the error:
File "A:\Apps\Python\Python.2.7.3.x86\lib\site-packages\zipline-0.5.6-py2.7.egg\zipline\utils\protocol_utils.py", line 85, in __getattr__
return self.__internal[key]
KeyError: 'shift'
Would anyone know why and how to do that correctly?
Many Thanks,
Vincent
Source: (StackOverflow)
I'm new to Pandas and Zipline, and I'm trying to learn how to use them (and use them with this data that I have). Any sorts of tips, even if no full solution, would be much appreciated. I have tried a number of things, and have gotten quite close, but run into indexing issues, Exception: Reindexing only valid with uniquely valued Index objects
, in particular. [Pandas 0.10.0, Python 2.7]
I'm trying to transform monthly returns data I have for thousands of stocks in postgres from the form:
ticker_symbol :: String, monthly_return :: Float, date :: Timestamp
e.g.
AAPL, 0.112, 28/2/1992
GS, 0.13, 30/11/1981
GS, -0.23, 22/12/1981
NB: The frequency of the reporting is monthly, but there is going to be considerable NaN data here, as not all of the over 6000 companies I have here are going to be around at the same time.
…to the form described below, which is what Zipline needs to run its backtester. (I think. Can Zipline's backtester work with monthly data like this, easily? I know it can, but any tips for doing this?)
The below is a DataFrame (of timeseries? How do you say this?), in the format I need:
> data
:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2268 entries, 1993-01-04 00:00:00+00:00 to 2001-12-31 00:00:00+00:00
Data columns:
AA 2268 non-null values
AAPL 2268 non-null values
GE 2268 non-null values
IBM 2268 non-null values
JNJ 2268 non-null values
KO 2268 non-null values
MSFT 2268 non-null values
PEP 2268 non-null values
SPX 2268 non-null values
XOM 2268 non-null values
dtypes: float64(10)
The below is a TimeSeries, and is in the format I need.
> data.AAPL
:
Date
1993-01-04 00:00:00+00:00 73.00
1993-01-05 00:00:00+00:00 73.12
...
2001-12-28 00:00:00+00:00 36.15
2001-12-31 00:00:00+00:00 35.55
Name: AAPL, Length: 2268
Note, there isn't return data here, but prices instead. They're adjusted (by Zipline's load_from_yahoo
—though, from reading the source, really by functions in pandas) for dividends, splits, etc, so there's an isomorphism (less the initial price) between that and my return data (so, no problem here).
(EDIT: Let me know if you'd like me to write what I have, or attach my iPython notebook or a gist; I just doubt it'd be helpful, but I can absolutely do it if requested.)
Source: (StackOverflow)
I'm working in zipline and pandas and have converted a pandas.Panel
to a pandas.DataFrame
using the to_frame()
method. This is the resulting pandas.DataFrame
which as you can see is multi-indexed:
price
major minor
2008-01-03 00:00:00+00:00 SPY 129.93
KO 26.38
PEP 64.78
2008-01-04 00:00:00+00:00 SPY 126.74
KO 26.43
PEP 64.59
2008-01-07 00:00:00+00:00 SPY 126.63
KO 27.05
PEP 66.10
2008-01-08 00:00:00+00:00 SPY 124.59
KO 27.16
PEP 66.63
I need to convert this frame to look like this:
SPY KO PEP
2008-01-03 00:00:00+00:00 129.93 26.38 64.78
2008-01-04 00:00:00+00:00 126.74 26.43 64.59
2008-01-07 00:00:00+00:00 126.63 27.05 66.10
2008-01-08 00:00:00+00:00 124.59 27.16 66.63
I've tried the pivot method, stack/unstack, etc. but these methods are not what I'm looking for. I'm really quite stuck at this point and any help is appreciated.
Source: (StackOverflow)
Using pip install zipline
on Windows 8 with Python 2.7 gives me the error:
Downloading/unpacking six (from python-dateutil==2.1->delorean->zipline[all])
Running setup.py egg_info for package six
Installing collected packages: blist, pytz, requests, python-dateutil, six
Running setup.py install for blist
building '_blist' extension
error: Unable to find vcvarsall.bat
Complete output from command C:\Python27\python.exe -c "import setuptools;__
file__='c:\\users\\ThatsMe\\appdata\\local\\temp\\pip-build-ThatsMe\\blist\\setup.py';ex
ec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" inst
all --record c:\users\ThatsMe\appdata\local\temp\pip-xvoky2-record\install-record.tx
t --single-version-externally-managed:
running install
running build
running build_py
running build_ext
building '_blist' extension
error: Unable to find vcvarsall.bat
Question: How can the error be resolved? Running pip install zipline[all]
gives the same error...
Source: (StackOverflow)