Asset Classes

Free investment financial education

Language

Multilingual content from IBKR

Close Navigation
Learn more about IBKR accounts
An Introduction to Unsupervised Learning for Trading

An Introduction to Unsupervised Learning for Trading

Posted March 3, 2023 at 9:34 am

Ashutosh Dave
QuantInsti

In the previous blogs, we examined supervised learning algorithms like linear regression in detail. In this blog, we look at what unsupervised learning is and how it differs from supervised learning.

Then, we move on to discuss some use cases of unsupervised learning in investment and trading. We explore two unsupervised techniques in particular- k-means clustering and PCA with examples in Python.

Contents

  • What is unsupervised learning?
  • Supervised vs unsupervised learning
  • When do we use unsupervised algorithms?
  • Clustering algorithms
  • Concept of clustering
  • K-means clustering algorithm
  • Dimensionality Reduction
  • Concept of dimensionality reduction
  • Principal Component Analysis
  • Other types of unsupervised algorithms
  • Challenges in unsupervised learning

What is unsupervised learning?

As the name suggests, ‘unsupervised’ learning takes place when there is no supervisor or teacher and the learner learns on her own.

For instance, consider a child who sees and tastes an apple for the very first time. She registers the colour, the texture, the taste and the smell of the fruit. The next time she sees an apple, she knows that both this and the previous apple are similar objects as they have very similar characteristics.

She knows that this is very different from an orange. But still, she does not know what it is called in human-speak, i.e. an ‘apple’ as there is no knowledge of the label.

Such learning where the labels do not exist (in the absence of a teacher) but the learner can still learn about patterns on her own is referred to as unsupervised learning.

In the context of machine learning algorithms, unsupervised learning occurs when an algorithm learns from plain examples without any associated response and determines the data patterns on its own.

In the next section, we will discuss how this type of learning differs from the other type of popular learning algorithms in machine learning, i.e. supervised learning algorithms.


Supervised vs unsupervised learning

Learning in supervised learning, as the name suggests, occurs under supervision, i.e., when the algorithm predicts a value for a sample from the training data, it is told whether the prediction was correct or not.

This is possible as we have the correct values stored as ‘labels’/’target variable’, which are passed to the algorithm along with the input data. Common supervised learning tasks are those of classification and regression.

In classification tasks, the labels are the correct class to which the sample belongs, whereas, in regression, the actual value of the dependent variable(Y) serves as a benchmark for comparing the prediction. The algorithm can then tweak its parameters to achieve higher accuracy in prediction.

Thus, the main goal of supervised learning is to build a robust predictive model.

On the other hand, in unsupervised learning, we only pass the input data, and there are no labels. Unsupervised models seek to find the underlying or hidden structure or distribution in the data in order to learn more about the data.

In other words, unsupervised learning is where we only have input data and no corresponding output variables, and the main goal is to learn more or discover new insights from the input data itself.

A common example of unsupervised algorithms are the clustering algorithms, that group the data based on the patterns that the machine detects.

For example, let us consider a situation in which we have a few data points based on two input features X1 and X2.

  • If we want our algorithm to classify/categorize the data into two known classes, we will use a supervised classification algorithm.
  • On the other hand, if we want the algorithm to tell us how the data is structured, we would use an unsupervised clustering algorithm.

Stay tuned for the next installment to learn about unsupervised algorithms.

Originally posted on QuantInsti Blog.

Disclosure: Interactive Brokers

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from QuantInsti and is being posted with its permission. The views expressed in this material are solely those of the author and/or QuantInsti and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

IBKR Campus Newsletters

This website uses cookies to collect usage information in order to offer a better browsing experience. By browsing this site or by clicking on the "ACCEPT COOKIES" button you accept our Cookie Policy.