Asset Classes

Free investment financial education

Language

Multilingual content from IBKR

Close Navigation
Learn more about IBKR accounts
Time Series Classification Synthetic vs Real Financial Time Series – Part VIII

Time Series Classification Synthetic vs Real Financial Time Series – Part VIII

Posted March 17, 2021 at 11:43 am
Matthew Smith
Matthew Smith - R Blog

Learn which R packages and data sets you need by reviewing Part IPart II ,Part III, Part IV, Part VPart VI and Part VII of this series.

That’s enough data analysis I could probably fit the PACF plots also along with a few more exploratory data analysis but I move on to generating the financial time series features using the tsfeatures package.

What I do in the below code is to take a random sample of 5 groups (Using the whole data set takes too long to calculate the time series features) and then apply all the functions in the tsfeatures package to each of the time series assets data which is does by mapping over each assets data and computing the time series features.

This section takes some time to process and compute (especially on the whole sample) and I already saved the results as a csv which I will just work from and load in the pre-computed time series features.

################# Generate Time Series Features ######################

# I create some time series features from the package “tsfeatures”. There are 40+ functions in the “tsfeatures” package
# which can generate approximately 106 time series features.
# Due to memory issues I am only able to create a few of the features, therefore I randomly sample 10 features from the
# “tsfeatures” package. We could also add in technical indicators from the “PerformanceAnalytics” or “TTR” packages (I omit these
# here, however creating ‘functions2 <- ls("package:TTR")' and adding it to the 'summarise' command will work.)

functions <- ls("package:tsfeatures")[1:42]
# functions <- sample(functions, 20)

Stats <- df %>%
group_by(row_id, class) %>%
nest() %>%
ungroup() %>%
sample_n(5) %>%
unnest() %>%
nest(-row_id, -class) %>%
group_by(row_id, class) %T>%
{options(warn = -1)} %>%
summarise(Statistics = map(data, ~ data.frame(
bind_cols(
tsfeatures(.x$value, functions))))) %>%
unnest(Statistics)

# I saved to whole dataset as “Stats” next I split it between training and test.
Stats <- read.csv("C:/Users/Matt/Desktop/Data Science Challenge/TSfeatures_train_val.csv")

Note: Again, bad practice by me. I just called the df data Stats which consists of only the time series features. This still only refers to the train_val.csv data and not the test.csv data.

The training data looks like: (after computing the time series features). Now each asset has been collapsed from ~260 days down to 1 signal time series feature observation.

Recall the goal here was to classify synthetic time series vs real time series and not what the next days price is going to be. For each asset I have a signal observation and based on this I can train a classifying algorithm to distinguish between real vs synthetic time series.

How the training data looks:

Table 4: tsfeatures package features
X row_id class ac_9_ac_9 acf_features_x_acf1 acf_features_x_acf10 acf_features_diff1_acf1 acf_features_diff1_acf10 acf_features_diff2_acf1 acf_features_diff2_acf10 ARCH.LM autocorr_features_embed2_incircle_1 autocorr_features_embed2_incircle_2 autocorr_features_ac_9 autocorr_features_firstmin_ac autocorr_features_trev_num autocorr_features_motiftwo_entro3 autocorr_features_walker_propcross binarize_mean_binarize_mean binarize_mean_NA compengine_embed2_incircle_1 compengine_embed2_incircle_2 compengine_ac_9 compengine_firstmin_ac compengine_trev_num compengine_motiftwo_entro3 compengine_walker_propcross compengine_localsimple_mean1 compengine_localsimple_lfitac compengine_sampen_first compengine_std1st_der compengine_spreadrandomlocal_meantaul_50 compengine_spreadrandomlocal_meantaul_ac2 compengine_histogram_mode_10 compengine_outlierinclude_mdrmd compengine_fluctanal_prop_r1 crossing_points dist_features_histogram_mode_10 dist_features_outlierinclude_mdrmd embed2_incircle entropy firstmin_ac firstzero_ac flat_spots fluctanal_prop_r1_fluctanal_prop_r1 arch_acf garch_acf arch_r2 garch_r2 histogram_mode alpha beta hurst hw_parameters_hw_parameters hw_parameters_NA localsimple_taures lumpiness max_kl_shift time_kl_shift max_level_shift time_level_shift max_var_shift time_var_shift motiftwo_entro3 nonlinearity outlierinclude_mdrmd x_pacf5 diff1x_pacf5 diff2x_pacf5 pred_features_localsimple_mean1 pred_features_localsimple_lfitac pred_features_sampen_first sampen_first_sampen_first sampenc scal_features_fluctanal_prop_r1 spreadrandomlocal_meantaul stability station_features_std1st_der station_features_spreadrandomlocal_meantaul_50 station_features_spreadrandomlocal_meantaul_ac2 std1st_der_std1st_der nperiods seasonal_period trend spike linearity curvature e_acf1 e_acf10 trev_num tsfeatures_frequency tsfeatures_nperiods tsfeatures_seasonal_period tsfeatures_trend tsfeatures_spike tsfeatures_linearity tsfeatures_curvature tsfeatures_e_acf1 tsfeatures_e_acf10 tsfeatures_entropy tsfeatures_x_acf1 tsfeatures_x_acf10 tsfeatures_diff1_acf1 tsfeatures_diff1_acf10 tsfeatures_diff2_acf1 tsfeatures_diff2_acf10 unitroot_kpss unitroot_pp walker_propcross
1 1 0 -0.0675275 0.0097094 0.0526897 -0.5005299 0.3297018 -0.6772403 0.6124739 0.0627825 0.3929961 0.6147860 -0.0675275 1 0.1208750 2.071663 0.5405405 1 1 0.3929961 0.6147860 -0.0675275 1 0.1208750 2.071663 0.5405405 1 1 1.788841 1.408737 1.68 1.43 -0.25 -0.2865385 0.1627907 132 -0.25 -0.2865385 0.3929961 0.9840151 1 3 4 0.1627907 0.0652585 0.0154406 0.0627825 0.0253367 -0.25 0.0013330 0.0013330 0.5000458 NA NA 1 0.3556536 1.783636 103 1.297736 97 2.819828 46 2.071663 0.0752319 -0.2865385 0.0108653 0.4457792 1.0525222 1 1 1.788841 1.788841 1.788841 0.1627907 1.76 0.0562693 1.408737 1.74 1.36 1.408737 0 1 0.0043052 0.0000261 0.8421403 -0.7069160 0.0052389 0.0588324 0.1208750 1 0 1 0.0043052 0.0000261 0.8421403 -0.7069160 0.0052389 0.0588324 0.9840151 0.0097094 0.0526897 -0.5005299 0.3297018 -0.6772403 0.6124739 0.0993829 -249.7732 0.5405405
2 2 0 -0.0421577 -0.0075902 0.0387481 -0.5171529 0.3129147 -0.6727897 0.5379301 0.0558032 0.4285714 0.6563707 -0.0421577 1 -0.4765229 2.077581 0.5019305 1 1 0.4285714 0.6563707 -0.0421577 1 -0.4765229 2.077581 0.5019305 1 1 1.780390 1.419266 1.95 1.00 0.50 0.2615385 0.1627907 123 0.50 0.2615385 0.4285714 0.9864332 1 1 4 0.1627907 0.0664358 0.0657859 0.0558032 0.0554355 0.50 0.0001000 0.0001000 0.5000458 NA NA 1 0.4636768 1.733008 247 1.311861 141 2.625772 221 2.077581 0.0273335 0.2615385 0.0256032 0.4606850 1.0171377 1 1 1.780390 1.780390 1.780390 0.1627907 2.05 0.0892206 1.419266 2.12 1.00 1.419266 0 1 0.0177460 0.0000399 0.9249561 0.7665407 -0.0218053 0.0411861 -0.4765229 1 0 1 0.0177460 0.0000399 0.9249561 0.7665407 -0.0218053 0.0411861 0.9864332 -0.0075902 0.0387481 -0.5171529 0.3129147 -0.6727897 0.5379301 0.0414599 -256.0485 0.5019305
3 3 1 0.0099598 -0.0405929 0.0449036 -0.5026683 0.3471209 -0.6718885 0.6109006 0.0325470 0.4671815 0.7065637 0.0099598 1 -0.8755173 2.069233 0.5328185 1 0 0.4671815 0.7065637 0.0099598 1 -0.8755173 2.069233 0.5328185 1 1 1.706841 1.443315 1.38 1.00 -0.50 -0.2538462 0.1395349 132 -0.50 -0.2538462 0.4671815 0.9868568 1 1 6 0.1395349 0.0388513 0.0039162 0.0325470 0.0041902 -0.50 0.0014557 0.0014557 0.5000458 NA NA 1 1.2670493 7.746711 95 1.403784 87 5.235499 84 2.069233 0.2436499 -0.2538462 0.0223069 0.5356408 0.9954919 1 1 1.706841 1.706841 1.706841 0.1395349 1.42 0.0716499 1.443315 1.42 1.00 1.443315 0 1 0.0141368 0.0000929 0.8414359 -0.0259311 -0.0547484 0.0492987 -0.8755173 1 0 1 0.0141368 0.0000929 0.8414359 -0.0259311 -0.0547484 0.0492987 0.9868568 -0.0405929 0.0449036 -0.5026683 0.3471209 -0.6718885 0.6109006 0.0775698 -258.1295 0.5328185
4 4 0 -0.0428748 -0.0443619 0.0615867 -0.4571442 0.3184053 -0.5906478 0.4361178 0.1275576 0.4555985 0.7027027 -0.0428748 2 -0.9943808 2.068744 0.4903475 0 0 0.4555985 0.7027027 -0.0428748 2 -0.9943808 2.068744 0.4903475 1 1 1.660825 1.445807 1.24 1.00 0.25 0.0153846 0.1395349 127 0.25 0.0153846 0.4555985 0.9790521 2 1 7 0.1395349 0.0694296 0.0112709 0.0579144 0.0123884 0.25 0.0480021 0.0001000 0.5000458 NA NA 1 1.0068624 4.994753 132 1.258758 173 5.886911 156 2.068744 0.3840091 0.0153846 0.0503205 0.5402603 1.1070217 1 1 1.660825 1.660825 1.660825 0.1395349 1.10 0.1065111 1.445807 1.14 1.00 1.445807 0 1 0.0283540 0.0000482 -1.2297854 0.2921899 -0.0728152 0.0752389 -0.9943808 1 0 1 0.0283540 0.0000482 -1.2297854 0.2921899 -0.0728152 0.0752389 0.9790521 -0.0443619 0.0615867 -0.4571442 0.3184053 -0.5906478 0.4361178 0.2129633 -262.0781 0.4903475
5 5 0 0.0259312 -0.2447835 0.1469130 -0.5810073 0.4796508 -0.6799229 0.6232529 0.2014861 0.6563707 0.7992278 0.0259312 1 -0.7167079 2.059764 0.5289575 1 0 0.6563707 0.7992278 0.0259312 1 -0.7167079 2.059764 0.5289575 1 1 1.347789 1.580825 1.08 0.98 -0.50 0.7961538 0.1627907 133 -0.50 0.7961538 0.6563707 0.9723766 1 1 9 0.1627907 0.2718058 0.2229375 0.1765130 0.1330761 -0.50 0.0001000 0.0001000 0.5000458 NA NA 1 2.8846415 11.474426 80 1.772392 229 8.468236 236 2.059764 0.2143595 0.7961538 0.1008392 0.7538746 1.2926800 1 1 1.347789 1.347789 1.347789 0.1627907 1.08 0.0797924 1.580825 1.06 0.98 1.580825 0 1 0.0121072 0.0001568 -0.5488436 0.2255538 -0.2599764 0.1558209 -0.7167079 1 0 1 0.0121072 0.0001568 -0.5488436 0.2255538 -0.2599764 0.1558209 0.9723766 -0.2447835 0.1469130 -0.5810073 0.4796508 -0.6799229 0.6232529 0.1506344 -323.5672 0.5289575
6 6 0 -0.0761166 0.0468556 0.0858348 -0.5253131 0.3438031 -0.6901570 0.6130725 0.0432628 0.4352941 0.6627451 -0.0761166 1 0.0898648 2.068914 0.5250965 1 1 0.4352941 0.6627451 -0.0761166 1 0.0898648 2.068914 0.5250965 1 1 1.751575 1.381854 2.69 1.71 -0.25 -0.0846154 0.3488372 134 -0.25 -0.0846154 0.4352941 0.9806218 1 5 5 0.3488372 0.0500806 0.0502154 0.0627968 0.0620877 -0.25 0.0286244 0.0001000 0.5188805 NA NA 1 0.2189481 3.145763 141 1.447883 80 2.077936 84 2.068914 0.0137733 -0.0846154 0.0172321 0.4345976 1.0881798 1 1 1.751575 1.751575 1.751575 0.3488372 2.61 0.1479673 1.381854 2.63 1.81 1.381854 0 1 0.0077481 0.0000329 -0.5473782 0.4505809 0.0410068 0.0873468 0.0898648 1 0 1 0.0077481 0.0000329 -0.5473782 0.4505809 0.0410068 0.0873468 0.9806218 0.0468556 0.0858348 -0.5253131 0.3438031 -0.6901570 0.6130725 0.0259414 -262.3484 0.5250965

## [1] 12000 109

The dimensions of the data as still 12,000 with 109 features (created from the tsfeatures package). That is we have 6,000 synthetic and 6,000 real financial time series (12,000 * ~260 = 3,120,000 but we applied tsfeatures to collapse the ~260 down to 1 single observation for each asset)

I collapsed this problem down from a time series expectation problem to a pure classification problem. I split the data between training and validation set next… I also split the data into X_trainY_train… etc.

I split the df/Stats data set into a train set of 75% of the observations and an in-sample test data set of 25% of the observations.

######################################################################
################# Train and XGBoost model on the TS Features #########

#Stats <- Stats %>%
# select_if(~sum(!is.na(.)) > 0)

# Split the training set up between train and a small validation set
smp_size <- floor(0.75 * nrow(Stats))
#set.seed(123)
train_ind <- sample(seq_len(nrow(Stats)), size = smp_size)

train <- Stats[train_ind, ]
val <- Stats[-train_ind, ]
# We have 106 time series features for the model to learn from.

x_train <- train %>%
ungroup() %>%
select(-class, -row_id, -X) %>%
as.matrix()

x_val <- val %>%
ungroup() %>%
select(-class, -row_id, -X) %>%
as.matrix()

y_train <- train %>%
ungroup() %>%
pull(class)
y_val <- val %>%
ungroup() %>%
pull(class)

Stay tuned for the next installment to find out how the training X (input variables) data looks.

Visit Matthew Smith – R Blog to download the complete R code and see additional details featured in this tutorial: https://lf0.com/post/synth-real-time-series/financial-time-series/

Disclosure: Interactive Brokers

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from Matthew Smith - R Blog and is being posted with its permission. The views expressed in this material are solely those of the author and/or Matthew Smith - R Blog and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

IBKR Campus Newsletters

This website uses cookies to collect usage information in order to offer a better browsing experience. By browsing this site or by clicking on the "ACCEPT COOKIES" button you accept our Cookie Policy.