- Solve real problems with our hands-on interface
- Progress from basic puts and calls to advanced strategies

Posted August 10, 2021 at 10:51 am
When you’re working with large universes of stock data you’ll come across a lot of challenges:
The challenges are well understood, but dealing with them is not always straightforward.
One significant challenge is gaps in data.
Quant analysis gets very hard if you have missing or misaligned data.
If you’re working with a universe of 1,000 stocks life is a lot easier if you have an observation for each stock for each trading date, regardless of whether it actually traded that day. That way:
trading_days * number_of_stocks rows.If you work with “wide” matrix-like data, these challenges are obvious because you have one row for every date in your data set, and the columns represent an observation for each ticker.
We usually work with long or “tidy” data – where each observation is an observation for a stock for a given day.
How do we work productively in this data, whilst still ensuring that we fill in any gaps in our long data with NAs?
The tidyverse makes this very straightforward. Let me show you!
First, here’s some dummy data to illustrate the problem:
library(tidyverse)
testdata <- tibble(date = c(1,1,2,2,2,3,3),
ticker = c('AMZN','FB','AMZN','FB','TSLA','AMZN','TSLA'),
returns = 1:7 / 100)
testdata## # A tibble: 7 x 3
## date ticker returns
## <dbl> <chr> <dbl>
## 1 1 AMZN 0.01
## 2 1 FB 0.02
## 3 2 AMZN 0.03
## 4 2 FB 0.04
## 5 2 TSLA 0.05
## 6 3 AMZN 0.06
## 7 3 TSLA 0.07Ideally we want a row for every date for every stock – with returns set to NA in the case where data is missing.
That way we can always look up a price by date. And we can always be sure that any grouped operations by ticker return the same size data set.
Turns out that the tidyr::complete function is exactly what we’re looking for. It turns implicit missing values – like the returns for TSLA on date 1 and FB on date 3 – into explicit missing values:
tidydata <- testdata %>%
complete(date, ticker)
tidydata## # A tibble: 9 x 3
## date ticker returns
## <dbl> <chr> <dbl>
## 1 1 AMZN 0.01
## 2 1 FB 0.02
## 3 1 TSLA NA
## 4 2 AMZN 0.03
## 5 2 FB 0.04
## 6 2 TSLA 0.05
## 7 3 AMZN 0.06
## 8 3 FB NA
## 9 3 TSLA 0.07Easy!
Now we have a row for every date for every stock.
Now we can safely do grouped aggregations by ticker, on the understanding that the data is the same size for all tickers, and we’ve removed one large source of potential analysis mishap…
tidydata %>%
group_by(ticker) %>%
summarise(count = n())## # A tibble: 3 x 2
## ticker count
## <chr> <int>
## 1 AMZN 3
## 2 FB 3
## 3 TSLA 3Visit Robot Wealth website for additional insight on this topic and to download the complete set of scripts: https://robotwealth.com/how-to-fill-gaps-in-large-stock-data-universes-using-tidyr-and-dplyr/
Past performance is not indicative of future results.
Any stock, options or futures symbols displayed are for illustrative purposes only and are not intended to portray recommendations.
Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from Robot Wealth and is being posted with its permission. The views expressed in this material are solely those of the author and/or Robot Wealth and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.
Join The Conversation
For specific platform feedback and suggestions, please submit it directly to our team using these instructions.
If you have an account-specific question or concern, please reach out to Client Services.
We encourage you to look through our FAQs before posting. Your question may already be covered!