# IB Quant Blog

1 2

### Basic Operations on Stock data using Python

Python has emerged as the fastest growing programming language and this has stemmed from multiple factors like ease to learn, readability, conciseness, strong developer community, application across domains etc. Python has found wide acceptance in trading too and this has led to Python-based analytics platforms, Python APIs, and trading strategies being built using Python.

The objective of this post is to illustrate how easy it is to learn Python and apply it to formulate and analyze trading strategies. If you are new to programming this blog might just help you overcome your fear of programming. Also, don’t forget to check out some nice links provided at the end of this blog to learn some exciting trading strategies which have been posted on our blog.

Let us run through some basic operations that can be performed on a stock data using Python. We start by reading the stock data from a CSV file. The CSV file contains the Open-High-Low-Close (OHLC) and Volume numbers for the stock.

The ‘TIME’ column seen here specifies the closing time of the day’s trading session. To delete the column we can simply use the ‘del’ command.

Now, let us use the type function to check whether the object is a pandas datetime index.

I would like to know the number of trading days (the number of rows) in the given data set. It can be done using the count method.

What if I want to know the maximum close price that was reached in the given period? This is made possible by using the max method.

Is it also possible to know the date on which this maximum price was reached? To find the respective date we apply the index property as shown below.

Let us compute the daily percentage change in closing price. We add a new column of ‘Percentage_Change’ to our existing data set. In the next line of code, we have filtered the percent change column for all the values greater than 1.0. The result has been presented below.

Finally, let us add a couple of indicators. We compute the 20-day simple moving average and the 5-day average volume. We can add more indicators to our data frame and then analyze the stock trend to see whether it is bullish or bearish. You can learn more on how to create various technical indicators in Python here.

In his short post, we covered some simple ways to analyze the data set and build more understanding of the stock data. Can you think of building a trading strategy using similar basic operations and simple indicators? Here are the links to articles on Python that can be explored for your own trading needs.

Trading Using Machine Learning In Python – SVM (Support Vector Machine)
Strategy using Trend-following Indicators: MACD, ST and ADX
Sentiment Analysis on News Articles using Python
Python Trading Strategy in Quantiacs Platform
In our upcoming posts, we will provide more ways and methods that can be used for trading using Python. Keep following our posts.

Next Step

If you want to learn various aspects of Algorithmic trading then check out QuantInsti’s Executive Programme in Algorithmic Trading (EPAT™).

Milind Paradkar holds an MBA in Finance from the University of Mumbai and a Bachelor’s degree in Physics from St. Xavier’s College, Mumbai. At QuantInsti®, Milind is involved in creating technical content on Algorithmic & Quantitative trading. Prior to QuantInsti®, Milind had worked at Deutsche Bank as a Senior Analyst where he was involved in the cash flow modeling of structured finance deals covering Asset-backed Securities (ABS) and Collateralized Debt Obligations (CDOs).

15983

### Deep Learning for Trading: Part 1

In the last few years, deep learning has gone from being an interesting but impractical academic pursuit to an ubiquitous technology that touches many aspects of our lives on a daily basis – including in the world of trading. This meteoric rise has been fuelled by a perfect storm of:

• Frequent breakthroughs in deep learning research which regularly provide better tools for training deep neural networks
• An explosion in the quantity and availability of data
• The availability of cheap and plentiful compute power
• The rise of open source deep learning tools that facilitate both the practical application of the technology and innovative research that drives the field ever forward

Deep learning excels at discovering complex and abstract patterns in data and has proven itself on tasks that have traditionally required the intuitive thinking of the human brain to solve. That is, deep learning is solving problems that have thus far proven beyond the ability of machines.

However, as anyone who has used deep learning in a trading application can attest, the problem is not nearly as simple as just feeding some market data to an algorithm and using the information to help make trading decisions. Some of the common issues that need to be solved include:

1. Working out a sensible way to frame the forecasting problem, for example as a classification or regression problem.

2. Scaling data in a way that facilitates training of the deep network.

3. Deciding on an appropriate network architecture.

4. Tuning the hyperparameters of the network and optimization algorithm such that the network converges sensibly and efficiently. Depending on the architecture chosen, there might be a couple of dozen hyperparameters that affect the model, which can provide a significant headache.

5. Coming up with a cost function that is applicable to the problem.

6. Dealing with the problem of an ever-changing market. Market data tends to be non-stationary, which means that a network trained on historical data might very well prove useless when used with future data.

7. There may be very little signal in historical market data with respect to the future direction of the market. This makes sense intuitively if you consider that the market is impacted by more than just its historical price and volume. Further, pretty much everyone who trades a particular market will be looking at its historical data and using it in some way to inform their trading decisions. That means that market data alone may not give an individual much of a unique edge.

The first five issues listed above are common to most machine learning problems and their resolution represents a big part of what applied data science is all about. The implication is that while these problems are not trivial, they are by no means deal breakers.

What is Keras?

Keras is a high-level API for building and training neural networks. Its strength lies in its ability to facilitate fast and efficient research, which of course is very important for systematic traders, particularly those of the DIY persuasion for whom time is often the limiting factor to success. Keras is easy to learn and its syntax is particularly friendly. Keras also plays nicely with CPUs and GPUs and can integrate with the TensorFlow, Theano and CNTK backends – without limiting the flexibility of those tools. For example, pretty much anything you can implement in raw TensorFlow, you can also implement in Keras, likely at a fraction of the development effort.

Keras is also implemented in R.

What’s next?

In the deep learning experiments that follow in Part 2 and beyond, we’ll use the R implementation of Keras with TensorFlow backend. We’ll be exploring fully connected feedforward networks, various recurrent architectures including the Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM), and even convolutional neural networks which normally find application in computer vision and image classification.

Stay tuned.

15912

### Asset Allocation for Sector ETFs : An Empirical Perspective on Estimation Error

In this article, Majeed Simaan uses the quantmod and the lubridate packages.

Introduction

The conventional wisdom in finance implies that investors should make rational decisions in which risk is compensated by reward. Such that in order to achieve a greater reward, investors need to bear more risk. This is the ideal view in financial economic thought and the foundation of the Modern Portfolio Theory (MPT). In the following, I would like to address this view in the presence of uncertainty, which is the inevitable ingredient of day-to-day decision making.

Let us think about an investor who is interested in allocating his wealth among a set of assets. As a rational investor, he should choose an optimal allocation among the set that yields the best reward for the level of risk he is willing to take. By reward, I refer to how much he expects to earn on his portfolio decision; whereas risk denotes how volatile this prospect will be. Formally, the former is measured by the expected return of his portfolio, while the latter is proxied by the standard deviation of his portfolio return. Such paradigm has been known as the mean-variance (MV) model, pioneered by Harry Markowitz in the early 1950s.

One of the underlying assumptions of the MV model is that the investor possesses the full information about the underlying assets. Specifically, it assumes that he knows the model's inputs without any uncertainty. Nonetheless, in reality as decision makers, we can either form our views about these assets using historical data or through some speculations about the future of the underlying assets (or the sectors/market). The result of which, inevitably, induces what is called estimation error into the asset allocation problem.

It has been well established in the recent MPT literature that estimation error impairs the performance of MV optimal portfolios. It appears that investors are better off by indifferently allocating their wealth among the underlying assets rather than trying to solve for an optimal allocation. Such practice has been known as the naive approach, since it does not incorporate information about the underlying assets. Nonetheless, a more recent literature debates whether this naive strategy outperforms MV portfolios, after accounting for estimation error in the portfolio optimization problem.

In this article, I will test the above implications using monthly asset returns for 9 sector ETFs dating between Jan 1999 and July 2017. I exclude one sector, which is the XLRE ETF, due to limited data availability. I will demonstrate the impact of estimation error on constructing MV optimal portfolios and then refer the reader to possible remedies used in the literature. In doing so, I hope to address the importance of portfolio optimization in the presence of uncertainty and the significance of taking into account estimation error.

Picture Source:  A 3D perspective on the mean-variance efficient frontier when estimation error is considered (source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2495621)

ETF Data

I use the quantmod to download on 9 sector ETFs and the lubridate package to deal with date format

In and Out of Sample

I split the data into two parts, in-sample and out-of-sample. The former resembles the window upon which decision is placed in terms of asset allocation, whereas the latter denotes the realization period.

The parameters of the two periods would differ significantly, especially when the first period contains the dot-com bubble and part of the recent financial crises. Looking at the difference between the in-sample and out-of-sample parameters below, it is evident that estimation error in the mean returns is more severe than the case for the second moments, i.e. variances and covariances of asset returns:

This small evidence is one of the main motivations in the practice of portfolio theory that focuses on the global minimum variance (henceforth GMV) portfolio. The GMV portfolio requires only information about the volatilities and covariances of the asset returns, unlike the case of efficient MV portfolios that also require knowledge about the mean returns. I shall get back to this issue later on.

MV Optimal Portfolios

There are a number of R packages available to perform portfolio optimization. Nevertheless, I will solve for optimal MV portfolios using a function that I designed myself. The function is coded using an R base constrained optimization function. In doing so, I hope to provide the reader with some exposure to the underlying science behind the practice of portfolio optimization.

Objective Function

The objective function is defined in terms of expected utility (EU). A decision maker chooses an optimal allocation that maximizes his EU of his terminal wealth. Hence, such a function takes into account two main components: the portfolio mean return and the volatility of the portfolio return. This represents the reward-risk trade-off, in which the EU increases with the former but decreases with the latter, an environment where risk in non-preferable such that investors are risk-averse.

The EU function takes 4 arguments. The first input is a vector X that denotes the allocation among the assets. The second and third inputs are the mean vector, M, and the covariance matrix, S, of the asset return. These two resemble the knowledge of the decision maker about the underlying assets. Finally, the fourth input is the risk-aversion of the decision maker, denoted by k.

The k parameter determines the preference of the decision maker in terms of risk tolerance. Let us consider two extreme cases. If the investor is only concerned with maximizing reward, then k is close to zero such that his utility is mainly determined by the portfolio expected return, regardless of the associated risk. On the other hand, if k goes to infinity, then it implies that the utility is mostly affected by the portfolio risk, whereas the utility derived from the portfolio expected return is trivial. In the former case, the investor chooses the asset with the highest mean return; while in the latter he would choose the GMV portfolio since risk is the main component that affects his utility.

We will consider k as given and let it range between 2.5 and 100. Clearly, the larger the k is the more risk-averse the investor is. Additionally, in assessing the mean vector, M, and the covariance matrix, S, we will consider the sample estimates for now. Clearly, k, M, and S are treated as given, whereas the main control variable the investor needs to choose is the portfolio weights, i.e. X. Therefore, with a given level of risk-aversion and equipped with both the M and S, the investor will choose the allocation that maximizes his EU.

Optimization

I use numerical optimization to construct optimal portfolios. The base constrOptim R function allows users to find the minimum point given an initial guess of the control variable. In addition, a gradient can be added to make the optimization more efficient. Ideally, numerical optimization tools rely on random searching algorithms to figure out the minimum (optimal) point. Hence, if one is able to direct this search in a more indicative way, the result of which will make the search more efficient and reliable. This is where the gradient comes into the picture.

I define the following function that takes four arguments: M, S, k, and BC. BC is a list that contains the budget constraints with respect to which the investor chooses his optimal allocation.

The basic budget constraint is that the investor allocates all of his wealth in the portfolio, such that the allocated proportions sum to 1. Other constraints may include limits on positions in individual assets or exclude short-sales. The latter are common constraints used in the practice of portfolio management, whereas the former case is usually used in the theoretical literature to derive tractable analytical results. I define the following two BC items:

For a given info about the mean vector and the covariance matrix of the assets, the desired level of risk taking, and some budget constraints. The above MV_portfolio function returns the optimal portfolio weights. It does so by initializing X to an equally weighted portfolio, which also satisfies the budget constraints, i.e.

The MV Efficient Frontier

One view is that investors could be compensated, in terms of portfolio expected return, the more risk they are willing to take. This results in the classical textbook parabola that captures the reward-risk trade-off, known as the MV efficient frontier. Such parabola is the corner stone of almost every Finance MBA class. Nonetheless, such trade-off in practice, i.e. when investors face estimation error, is not as clear.

Basic Budget Constraints

I demonstrate this issue in the following figure. The y-axis represents the portfolio mean return, while the x-axis denotes the portfolio risk return, proxied by the standard deviation of the portfolio return. The figure has two lines. The solid line is the classical MV efficient frontier, which is constructed using the out-of-sample data. This is the hypothetical case, which serves as our benchmark. On the other hand, the dashed line represents the frontier for the in-sample case, which is the more realistic one.

The reward-risk trade-off is very evident in the solid line. This implies, if we could assess the future reward-risk trade-off, then there is an additional reward for tolerating more risk. However, the dashed line tells us a different story. Specifically, it implies that investors get punished for taking more risk, something that contradicts the whole foundation of financial economic thought.

The reason for the above evidence is the presence of estimation error. On the top left of the dashed line, I highlight the GMV in-sample constructed portfolio. In this case, I use the covariance matrix from the in-sample window to construct the portfolio that yields me the lowest standard deviation. Nonetheless, as we move away from this point, my portfolio also relies on the assets mean return, which associated with greater estimation error. Clearly, this justifies the conventional wisdom that argues that if you deviate from the GMV portfolio, your portfolio will suffer due to greater estimation error.

Standing next to the GMV portfolio, is the naive one denoted by a cross. Clearly, the naive strategy dominates most of MV portfolios (top and left to most points on the dashed line). However, that does not hold true for the GMV portfolio. In fact, the GMV portfolio dominates the naive one by achieving a higher mean return for a lower risk. In any case, we also observe, if estimation error is absent (black line), then the naive portfolio would have been considered MV sub-optimal.

It is common in the practice of portfolio management to use ad-hoc techniques, such as limiting the exposure to a certain sector or avoid short-sales at all. While such practice seems sub-optimal from a theoretical point of view, it has important implications on estimation error.

I repeat the same exercise as before but with the addition of short-sales constraints. In the same fashion of the previous figure, I demonstrate the case when short-sales are not allowed. I highlight the new results using the red color and compare with the previous one as follows

In the absence of estimation error (i.e. hypothetical case), it is clear that solving a constrained optimization problem results in a sub-optimal solution. In this case, the red solid is below the black solid line, such that portfolio optimization that omits short-sales yields sub-optimal MV portfolios compared with the ones that do not impose such. Nevertheless, we also observe that short-sales constraints mitigate the risk exposure of the investor. For the same level of risk-aversion, the investor ends taking less risk. Alternatively, one can argue that no-short-sales investors are more risk-averse in nature.

Looking more at the realistic case, i.e. the presence of estimation error, it is clear from the red dashed line that short-sales constraints limit the exposure of the investor to excessive risk-taking. Nonetheless, we can still see that investors get punished for taking excessive risk for which he does not get rewarded accordingly. On the other hand, we still observe that the GMV portfolio dominates the naive one and that there is a small change in the location of the GMV point.

In either case, we can argue that short-sales limits the exposure of the investor to excessive risk. What's more interesting, nevertheless, is the following observation. While adding short-sales seem MV sub-optimal under full information perspective (i.e. the red solid line versus the black solid line), this is not the case when we take into account estimation error. Clearly, the red dashed line does not seem to be less MV sub-optimal as the black dashed. In fact, it appears that the former mitigates underperformance due to estimation error.

Next Steps

Most of the recent literature on portfolio optimization proposes different ways to mitigate estimation error. Those approaches include Bayesian and shrinkage methods. In fact, it has been established that some of the shrinkage approaches are consistent with adding short-sales constraints. Nevertheless, due to the greater estimation error associated with the assets mean return, the focus has been limited to the GMV portfolio alone. In the next article, I would like to devote the discussion to the GMV portfolio and apply some of these techniques to yield estimation-error-robust portfolios. Stay tuned!

Appendix

You can access the complete R source code used in this article via my R Corner available at my homepage.

Majeed Simaan is a PhD candidate in Finance at Rensselaer Polytechnic Institute.  His research interests revolve around Banking and Risk Management, with emphasis on asset allocation and pricing. He is well versed in research areas related to banking, asset pricing, and financial modeling. He has been involved in a number of projects that apply state of the art empirical research tools in the areas of financial networks (interconnectedness), machine learning, and textual analysis. His research has been published in the International Review of Economics and Finance and the Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence.

Before joining RPI, Majeed pursued graduate training in the area of Mathematical Finance at the London School of Economics (LSE). He has a strong quantitative background in both computing and statistical learning. He holds both BA and MA in Statistics from the University of Haifa with specialization in actuarial science.

15915

### From Potential to Proven: Why AI is Taking Off in the Finance World

Kris Longmore shares that “Until recently, I was working as a machine learning consultant to financial services organizations and trading firms in Australia and the Asia Pacific region. A few months ago, I left that world behind to join an ex-client’s proprietary trading firm. I thought I’d jot down a few thoughts about what I saw during my consulting time because I witnessed some interesting changes in the industry in a relatively short period of time that I think you might find interesting too. Enjoy!”

Perceptions around Artificial Intelligence (AI) in the finance industry have changed significantly, as skepticism gives way to a rising Fear of Missing Out (FOMO) among asset managers and trading houses.

Big Data and AI Strategies – Machine Learning and Alternative Data Approaches to Investing, JP Morgan’s 280-page report on the future of machine learning in the finance industry, paints a picture of a future in which alpha is generated from data sources such as social media, satellite imagery, and machine-classified company filings and news releases.

Well that future is already here.

Amongst value managers, I saw skepticism become replaced with a sense of anxiety over being late to the party. The first question I was asked by nearly every value manager I met over the last year or so was: “what is everyone else doing with machine learning?”

This sense of FOMO is arising now because general knowledge of the potential of machine learning has reached a critical mass amongst the decision makers and management across the industry.

Despite the seclusion inherent in our industry, where ‘secret sauce’ is closely guarded, the fruits of the labour of the early adopters are gaining ever-increasing public exposure, shifting the perception of the technology from ‘potential’ to ‘proven’.

In short, finance is catching up to the many other industries where this technology is already in common use.

Shifting attitudes within the quant community

When my consulting company first started applying and recommending machine learning solutions to financial problems, we encountered mixed attitudes from the industry. While a few were enthusiastic adopters who could see the potential, the attitude that machine learning was less than useful – even dangerous – and dismissals of the technology as ‘voodoo science’ were incredibly common.

Surprisingly, these attitudes often came from other quant researchers.

Within the quant community, I’ve witnessed first-hand this attitude gradually giving way to one of recognition of machine learning as a useful tool. I’ve even noted some folks who decried the approach now calling themselves ‘machine learning experts’ on their business cards and LinkedIn profiles. Times really have changed, and they changed in an astonishingly short space of time.

More recently, I’ve seen an even more significant change, as participants increasingly recognise machine learning as the key to unlocking the next generation of alpha. Suddenly, it feels like the prevailing attitude towards machine learning and AI is one of excited and enthusiastic adoption, as opposed to reluctance and skepticism.

Amid the growing consensus that alpha is discoverable in alternative data, our own work and the work of others suggests that alpha from such sources may be uncorrelated with traditional factors like value and momentum. Perhaps, for the time being at least, they can coexist and even provide new dimensions of diversification.

Changing the way we look for alpha

Alpha generation has always been about information advantage – either having access to uncommon insights gained through ingenuity or common insights acted upon before everyone else.

Machine learning and artificial intelligence is simply the modern evolution of a repeating historical pattern in the context of today’s big data world. For example, interpreting satellite imagery of a retailer’s car park reveals insight about its sales figures before they are released to the market. Deriving sentiment from Twitter or Weibo and relating it to an asset’s returns provides an uncommon insight gained through ingenuity.

Artificial intelligence excels at tasks like these to the point that such AI is rapidly becoming a commodity.

As the pool of data (be it alternative, big, structured or unstructured) continues its exponential growth, machine learning and artificial intelligence tools will increasingly be adopted for processing and unravelling it – simply because they are the best tools for the job.

JP Morgan believes there will come a time when they are the only tools for the job.

My experience tells me that that time has already arrived – fund managers who are slow to the party would do well to get on board to not only build competitive advantage, but to maintain what they’ve already got.

15766

### Statistical Computing with R

Statistical Computing with R

The 2017 IASC-ARS/NZSA Conference took place at the University of Auckland in New Zealand earlier this month.

The event also marked the retirement of Professor Ross Ihaka, one of the co-creators of the R programming language.

Here are some of the highlights from the conference:

Simon Urbanek, a co-president of the R Foundation and researcher at AT&T Research Lab, delivered an exciting speech on "R in times of Growing User Base and Big Data." He also demonstrated RCloud – a social environment for coding in R.

Professor Ross Ihaka presented “Could Do Better… A Report Card for Statistical Computing.” His presentation tracked the history of R to its current state, showing possible paths to improvement.  Some of the key points included “eliminate first-class environments” and “use true scalar values.” For further developments on this research, stay tuned by following the University of Auckland Department of Statistics website: https://www.stat.auckland.ac.nz/en.html

Professor Jenny Bryan, a very popular R speaker, and Associate Professor of Statistics at the University of British Columbia, gave an excellent lecture on organizing and optimizing R coding. She demonstrated how easy it is to use RStudio Projects.

Professor Paul Murrell, an Associate Professor at the University of Aukland, and a member of the R Foundation Core Development team, spoke on “gridSVG Then and Now.” Prof. Murrell, an expert on the graphics system in R, showcased the pros and cons and concluded that   “gridSVG” is faster (than its former self).” For details on his research, visit: https://www.stat.auckland.ac.nz/people/pmur002

Hadley Wickham, Chief Scientist at RStudio, taught us how to promote R packages in 3 easy steps (see picture below) by using GitHub, README and pkgdown.

Picture Source: Hadley Wickham's presentation on "Promoting Your R Package" at the University of Auckland for the 2017 IASC-ARS/NZSA Conference. For more details, visit Hadley’s website: http://hadley.nz/

For additional info on the conference, visit the University of Auckland Department of Statistics https://www.stat.auckland.ac.nz/en.html and NZSA websites https://www.nzsa2017.com/.

15764

1 2