In the previous installment, the author discussed Importing the dataset.
Preparing the dataset
dataset[‘H-L’] = dataset[‘High’] – dataset[‘Low’]
dataset[‘O-C’] = dataset[‘Close’] – dataset[‘Open’]
dataset[‘3day MA’] = dataset[‘Close’].shift(1).rolling(window = 3).mean()
dataset[’10day MA’] = dataset[‘Close’].shift(1).rolling(window = 10).mean()
dataset[’30day MA’] = dataset[‘Close’].shift(1).rolling(window = 30).mean()
dataset[‘Std_dev’]= dataset[‘Close’].rolling(5).std()
dataset[‘RSI’] = talib.RSI(dataset[‘Close’].values, timeperiod = 9)
dataset[‘Williams %R’] = talib.WILLR(dataset[‘High’].values, dataset[‘Low’].values, dataset[‘Close’].values, 7)
We then prepare the various input features which will be used by the artificial neural network learning for making the predictions. We define the following input features:
- High minus Low price
- Close minus Open price
- Three day moving average
- Ten day moving average
- 30 day moving average
- Standard deviation for a period of 5 days
- Relative Strength Index
- Williams %R
dataset[‘Price_Rise’] = np.where(dataset[‘Close’].shift(-1) > dataset[‘Close’], 1, 0)
We then define the output value as price rise, which is a binary variable storing 1 when the closing price of tomorrow is greater than the closing price of today.
dataset = dataset.dropna()
Next, we drop all the rows storing NaN values by using the dropna() function.
X = dataset.iloc[:, 4:-1]
y = dataset.iloc[:, -1]
We then create two data frames storing the input and the output variables. The dataframe ‘X’ stores the input features, the columns starting from the fifth column (or index 4) of the dataset till the second last column. The last column will be stored in the dataframe y, which is the value we want to predict, i.e. the price rise.
Splitting the dataset
split = int(len(dataset)*0.8)
X_train, X_test, y_train, y_test = X[:split], X[split:], y[:split], y[split:]
In this part of the code, we will split our input and output variables to create the test and train datasets. This is done by creating a variable called split, which is defined to be the integer value of 0.8 times the length of the dataset.
We then slice the X and y variables into four separate data frames: Xtrain, Xtest, ytrain and ytest. This is an essential part of any machine learning algorithm, the training data is used by the model to arrive at the weights of the model. The test dataset is used to see how the model will perform on new data which would be fed into the model. The test dataset also has the actual value for the output, which helps us in understanding how efficient the model is. We will look at the confusion matrix later in the code, which essentially is a measure of how accurate the predictions made by the model are.
In the next installment, the author will demonstrate how Feature Scaling
Visit https://www.quantinsti.com/ for ready-to-use functions as applied in trading and data analysis.
Disclosure: Interactive Brokers
Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from QuantInsti and is being posted with its permission. The views expressed in this material are solely those of the author and/or QuantInsti and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.
Join The Conversation
If you have a general question, it may already be covered in our FAQs. If you have an account-specific question or concern, please reach out to Client Services.