# IBKR Quant Blog

### Decision Tree For Trading Using Python - Part II

Creating the predictors

Predictor variables are data that we think are related to market behavior. These data can be very diverse such as the technical indicators, market data, sentiment data, breadth data, fundamental data, government data, etc. that will help us to make forecasts about the future behavior of the market.

We will test the classical indicators for trend following and for range trading, these are:

• EMA
• ATR
• RSI
• MACD

Therefore, the decision tree algorithm should help us select the best combination of indicators along with their parameters that maximize the expected output which is the target.

We are going to prepare the data by calculating the indicators that we will use as predictors using the Ta-lib library:

```import talib as ta
df['EMA10'] = ta.EMA(df['Settle'].values, timeperiod=10)
df['EMA30'] = ta.EMA(df['Settle'].values, timeperiod=30)
df['ATR'] = ta.ATR(df['High'].values, df['Low'].values, df['Settle'].values, timeperiod=14)
df['RSI'] = ta.RSI(df['Settle'].values, timeperiod=14)
macd, macdsignal, macdhist = ta.MACD(df['Settle'].values, fastperiod=12, slowperiod=26, signalperiod=9)
df['MACD'] = macd
df['MACDsignal'] = macdsignal
df.tail()```

We have already calculated the indicators, but it is necessary to emphasize that we have calculated them with the standard parameters and these can and must be optimized since the decision tree works with the pre-calculated indicators.

On the other hand, EMAs and MACDs do not serve as they are, since the signal comes from the price in relation to averages, or from one average in relation to the other. Let’s calculate the columns that will serve as predictors for the averages and the MACD.

```import numpy as np
df['ClgtEMA10'] = np.where(df['Settle'] > df['EMA10'], 1, -1)
df['EMA10gtEMA30'] = np.where(df['EMA10'] > df['EMA30'], 1, -1)
df['MACDSIGgtMACD'] = np.where(df['MACDsignal'] > df['MACD'], 1, -1)
df.tail()```

What we have now are possible trading rules that we will introduce in the decision tree to help us identify the best combination of these indicators to maximize the result.

• EMA, we are interested in when the price is above average and when the fastest average is above the slowest average.
• ATR(14), we’re interested in the threshold that will trigger the signal.
• ADX(14), we’re interested in the threshold that will trigger the signal.
• RSI(14), we’re interested in the threshold that will trigger the signal.
• MACD, we are interested in when the MACD signal is above MACD.

In this example, the predictor variables for the classification decision tree and the regression decision tree will be the same, although the target variables are different because for the classification algorithm the output will be categorical and for the regression algorithm the output will be continuous.

Creating the target variables

As we have already said, the classification and regression decision trees have different objectives. While the classification decision tree tries to characterize the future by offering a categorical variable, i.e. the market goes up or down, the regression decision tree tries to forecast the future value (i.e. the future market price).

We are going to create here the target variables for the two types of problems, although each one will use its own target.

```df['Return'] = df['Settle'].pct_change(1).shift(-1)
df['target_cls'] = np.where(df.Return > 0, 1, 0)
df['target_rgs'] = df['Return']
df.tail()```

The target variable for the regression algorithm (target_rgs) uses the lagged return, this is so, because we want the algorithm to learn what happened the next day, based on the information available at the present time.

The target variable for the classification algorithm (target_cls) also uses the lagged return, but because the output is categorical, we must transform it. If the return was positive, we assign 1 and if it was negative, we assign 0.

In the next installment, the author will discuss how to Obtain the data set for decision trees.

This article is from QuantInsti and is being posted with QuantInsti’s permission. The views expressed in this article are solely those of the author and/or QuantInsti and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

21734