Building a Zipline Bundle for Yahoo CSV Files

Zipline is a fantastic tool for backtesting and data is the main raw material for doing this kind of analysis. In this post, we are going to focus on how to load our own data files. Through an example, we will create a bundle to load data from csv files downloaded from Yahoo finance.

We cover:

Zipline recap
A bundle overview
Creating a bundle for Yahoo csv daily data
Registering the bundle
Ingesting data into Zipline
Run a backtest with the new bundle

Zipline recap

As we saw in the last post, the Zipline library is a powerful tool for backtesting that lets us focus on the strategy not without first making every effort to have the system ready.

Although Quantopian has stopped operations, we can still enjoy the great work they did with the Zipline library.

In this blog, we will see how to load data in Zipline from several sources such as Yahoo. The data will come from csv files for undated instruments such as:

Stocks,
ETFs,
CFDs,
FX, etc.

Before reading on, it’s imperative to remember that if you want to simplify your life, you can use Blueshift that provides historical data for backtesting and real-time data with connection to several brokers to put your algorithm live without the slightest effort. Otherwise, keep reading.

Zipline calls this the ingest process. The connector that lets us be able to read a data source and load to Zipline is the bundle script.

By default, the Zipline library comes with a few bundles to connect with eg. Quandl Wiki DB and csv files. Yet usually we need to connect to other data sources with different formats, column names, etc.

For this reason, we need to create a bundle in order to be able to ingest the data and run backtests over them. That’s the topic we discuss here.

A bundle overview

A bundle is an ETL tool. The Extraction, Transformation and Load (ETL) is a well-known process in data science. It means that the bundle Python script needs to connect to a data source (web, file or database).

Extract the data and load into memory in a convenient data structure as a DataFrame.
Normalize the data by cleaning and Transforming the NA, column names, dates and times, etc.
Finally, load the normalized data into the Zipline data repository. By default is an SQLite although can be any other DB.

Although it may seem like an overwhelming task, we can use the available csvdir bundle as a template. So the bundle development will be a bit easier.

Creating a bundle for Yahoo csv daily data

Let’s assume we have a folder with daily data downloaded from Yahoo. Note that, by default, the csvdir.py script looks for the data inside folders named daily and minute, hence we need to include Yahoo’s csv files inside the daily folder.

The whole process in one line:

We need to read the data, transform them to the Zipline format and load them into the Zipline repository. This is the ETL process.

We will use the csvdir bundle included with the library as a template. The csvdir.py script is inside the following folder:

~/opt/miniconda3/envs/zipline35/lib/python3.5/site-packages/zipline/data/bundles

The marked part of the path depends on your machine and on the Conda environment name you are using. Our customized bundle file must be in that folder too.

First, let’s create a copy of the csvdir.py to a recognizable name for what we are going to do. For example, here, we will make a bundle for Yahoo data listed on the NYSE. For example yahoo_NYSE.py

Open the new yahoo_NYSE.py bundle in your favourite editor. We are going to start editing it to adapt the Yahoo data to Zipline data format and be able to use it in the ingestion process.

If we look inside the file, we have functions, classes and methods needed to undertake in the ETL process. In this post we won’t explain all the code, you have the API documentation for that. Here we’ll look at the parts needed for understanding and change.

Change the name of the main function, I like to use the same name as the file name. So the name will be yahoo_NYSE.

This function accepts two input parameters. The first one is a list offor the data frequency. Minute, daily or both. The second one, is the folder where we have the Yahoo daily data for this case. We don’t use these parameters at this point, but it is useful to be aware of them.

The output of this function is a class named CSVDIRBundle, modify this name as, for example, Yahoo_NYSEBundle.

At lines 92, 97 and 98 it’s needed to change the bundle name, this is the function name we call with the ingest zipline’s command. Line 97 indicates the name we will be registering as a bundle inside Zipline.

Inside the function declared at line 98, we can see the data format expected by Zipline, there is some code to deal with the input parameters and works with metadata, splits, etc.

We need to modify the market calendar CSVDIR in order to use the generic market calendar for the NYSE at line 161.

The function needed to modify in order to adapt our data into Zipline format is named _pricing_iter at line 171. This function reads the csv files and loads them into the Zipline DB.

Here we can see the key part of the code:

It reads the csv files and after that, we can inspect the content, modify the column names, drop the NA or any other change required in the data. For example, in line 188, we drop the possible duplicate dates.

We can include as many print sentences as needed to trace the code execution.

The key here is to align the csv data index with the NYSE market calendar. Line 207 needs the sessions variable to do that.

We create the sessions dates from our data first date to the last date. Include this line at line 154, after the time frame is defined.

Include the variable name in the parameters of the write function calling, line 156.

And accept it in the input parameters of the _pricing_iter function.

Finally, comment or drop the last code line, because we want to use the NYSE calendar with these data files.

Stay tuned for the next installment to learn how to test the new bundle.

Visit QuantInsti for additional insight on this topic: https://blog.quantinsti.com/zipline-bundle-yahoo/.

Join The Conversation

For specific platform feedback and suggestions, please submit it directly to our team using these instructions.

If you have an account-specific question or concern, please reach out to Client Services.

We encourage you to look through our FAQs before posting. Your question may already be covered!

Visit IBKR.com Open an IBKR Account

Disclosure: Interactive Brokers Third Party

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from QuantInsti and is being posted with its permission. The views expressed in this material are solely those of the author and/or QuantInsti and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

How much could you save on your margin loan by switching to Interactive Brokers?

Fill out the information below to see your estimated savings.

Current Interest Rate

Balance

USD

Margin Amount Borrowed

USD

Time Margin is Borrowed

IBKR will assess a surcharge of 1% on large loan balances unless otherwise prearranged with IBKR. The 1% surcharge would apply to all balances in the highest tier.

The interest calculator is based on information that we believe to be accurate and correct, but neither Interactive Brokers LLC nor its affiliates warrant its accuracy or adequacy and it should not be relied upon as such. Neither IBKR nor its affiliates are responsible for any errors or omissions or for results obtained from the use of this calculator.

Restrictions apply. Annual Percentage Rate (APR) on USD margin loan balances for IBKR Pro as of October 3, 2024. Interactive Brokers calculates the interest charged on margin loans using the applicable rates for each interest rate tier listed on its website. Learn more about margin loan rates.

The projections or other information generated by the Interest Calculator tool are hypothetical in nature, do not reflect actual results and are not guarantees of future results. Please note that results may vary with use of the tool over time.

Trading on margin is only for experienced investors with high risk tolerance. You may lose more than your initial investment. For additional information about rates on margin loans, please see Margin Loan Rates.

Master options fundamentals with our new Interactive Learning course

Building a Zipline Bundle for Yahoo CSV Files – Part I

Zipline recap

A bundle overview

Creating a bundle for Yahoo csv daily data

Join The Conversation

Disclosure: Interactive Brokers Third Party

Information on Other Interactive Brokers Affiliates

Interactive Brokers Canada Inc.

Interactive Brokers Australia Pty. Ltd.

Interactive Brokers Hong Kong Limited

Interactive Brokers India Pvt. Ltd.

Interactive Brokers Securities Japan Inc.

Interactive Brokers Singapore Pte. Ltd.

IBKR Campus Log In

Master options fundamentals with our new Interactive Learning course

Zipline recap

A bundle overview

Creating a bundle for Yahoo csv daily data

Join The Conversation

Disclosure: Interactive Brokers Third Party

Bi-Weekly Newsletter

Daily Newsletter

Weekly Newsletter

Weekly Newsletter

Monthly Newsletter