Decision Trees are a Machine Supervised Learning method used in Classification and Regression problems, also known as CART.
Remember that a Classification problem tries to classify unknown elements into a class or category; the output always are categorical variables (i.e. yes/no, up/down, red/blue/yellow, etc.).
A Regression problem tries to forecast a number such as the return for the next day. It must not be confused with linear regression which is used to study the relationship between variables.
Although the classification and regression problems have different objectives, the trees have the same structure:
Thanks to Python’s Sklearn library, the tree is automatically created for us taking as a starting point the predictor variables that we hypothetically think are responsible for the output we are looking for.
In this introduction post to decision trees, we will create a classification decision tree in Python to make forecasts about whether the financial instrument we are going to analyze will go up or down the next day.
We will also create a regression decision tree to make forecasts about the concrete return of the index the next day.
Preparing the Environment
Be sure you have available the following software pieces in order to follow the examples:
Building a Decision Tree
Building a classification decision tree or a regression decision tree is very similar in the way we organize the input data and predictor variables, then, by calling the corresponding functions, the classification decision tree or regression decision tree will be automatically created for us according to some criteria we must specify.
The main steps to build a decision tree are:
If we look at the first four steps, they are common operations for data processing. If you are a newcomer to decision trees the predictor and target variables may sound exotic to you. However, they are nothing more than additional columns in the data frame that contain some type of indicator. These indicators or predictors are used to predict the target variable that is the financial instrument will go up or down for the classification model, or the future price level for the regression model. Likewise, splitting data is a mandatory task in any back testing process (ML or not), the idea is to have one set of data to train the model and another set of data, which have not been used in training, to test the model.
Steps 5 and 6 are related to the ML algorithms for the decision trees specifically. As we will see, the implementation in Python is quite simple. However, it is fundamental to understand well the parameterization and the analysis of the results. This post is eminently practical and to go deeper into the underlying mathematics we recommend reading the references at the bottom of the post.
Getting the data
The raw material for any algorithm is data. In our case, they would be the time series of financial instruments, such as indices, stocks etc. and it usually contains details like the opening price, maximum, minimum, closing price and volume. This information is recorded at a certain frequency, such as minutes, hours, days or weeks, and forms a time series.
There are multiple data sources to download the data, free and premium. The most common sources for free daily data are Quandl, Yahoo or Google or any other data source we trust.
Here, we are going to work with twenty years of daily data from the Emini S&P 500 that we will retrieve through Quandl.
df = quandl.get("CHRIS/CME_ES2")
We now have just over 21 years of Emini S&P500 data available. We will use the settle price as the closing price reference.
In the next post, the author will discuss creating the variables.
To download the code in this article, visit QuantInsti website and the educational offerings at their Executive Programme in Algorithmic Trading (EPAT™).
This article is from QuantInsti and is being posted with QuantInsti’s permission. The views expressed in this article are solely those of the author and/or QuantInsti and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.
We appreciate your feedback. If you have any questions or comments about IBKR Quant Blog please contact email@example.com.
The material (including articles and commentary) provided on IBKR Quant Blog is offered for informational purposes only. The posted material is NOT a recommendation by Interactive Brokers (IB) that you or your clients should contract for the services of or invest with any of the independent advisors or hedge funds or others who may post on IBKR Quant Blog or invest with any advisors or hedge funds. The advisors, hedge funds and other analysts who may post on IBKR Quant Blog are independent of IB and IB does not make any representations or warranties concerning the past or future performance of these advisors, hedge funds and others or the accuracy of the information they provide. Interactive Brokers does not conduct a "suitability review" to make sure the trading of any advisor or hedge fund or other party is suitable for you.
Securities or other financial instruments mentioned in the material posted are not suitable for all investors. The material posted does not take into account your particular investment objectives, financial situations or needs and is not intended as a recommendation to you of any particular securities, financial instruments or strategies. Before making any investment or trade, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice. Past performance is no guarantee of future results.
Any information provided by third parties has been obtained from sources believed to be reliable and accurate; however, IB does not warrant its accuracy and assumes no responsibility for any errors or omissions.
Any information posted by employees of IB or an affiliated company is based upon information that is believed to be reliable. However, neither IB nor its affiliates warrant its completeness, accuracy or adequacy. IB does not make any representations or warranties concerning the past or future performance of any financial instrument. By posting material on IB Quant Blog, IB is not representing that any particular financial instrument or trading strategy is appropriate for you.