Learn how to create new dataframes with Part I, how to perform basic mathematical operations in Part II and see Part III for instructions on how to use the package RDatasets.jl.
Dealing with missing data
Julia has a “missing” object that is used for unavailable data. You can use skipmissing() function to perform operations ignoring the missing values.
Output:
a | b |
---|---|
Int64? | String? |
1 | Apple |
missing | Orange |
3 | missing |
7 | Grapes |
You can use dropmissing() function to remove the missing values.
a | b |
---|---|
Int64 | String |
1 | Apple |
7 | Grapes |
More details for dealing with missing values can be found here.
Importing and exporting data as CSV and Excel files
Reading data is the first step in analysing any kind of data. Most of the information we come across is either in CSV or excel format, so we’ll focus on these two. We will work with CSV.jl and XLSX.jl for dealing with CSV and Excel files.
Reading and writing CSV files
We’ll read a CSV file (infy.csv), as a dataframe, containing historical stock price data for Infosys downloaded from Yahoo finance for the period 21-Dec-2020 to 22-Dec-2021.
Here’s a summary for this data.
variable | mean | min | median | max | nmissing | eltype |
---|---|---|---|---|---|---|
Symbol | Union… | Any | Union… | Any | Int64 | DataType |
Date | 2020-12-22 | 2021-12-21 | 0 | Date | ||
Open | 20.5674 | 16.39 | 20.63 | 24.05 | 0 | Float64 |
High | 20.7164 | 16.69 | 20.775 | 24.5 | 0 | Float64 |
Low | 20.4097 | 16.36 | 20.51 | 23.94 | 0 | Float64 |
Close | 20.5685 | 16.58 | 20.725 | 24.22 | 0 | Float64 |
Adj Close | 20.3422 | 16.2664 | 20.5451 | 24.22 | 0 | Float64 |
Volume | 7.09982e6 | 1320600 | 6.43815e6 | 22911800 | 0 | Int64 |
Here, we calculate the range –
Date | Open | High | Low | Close | Adj Close | Volume | range |
---|---|---|---|---|---|---|---|
Date | Float64 | Float64 | Float64 | Float64 | Float64 | Int64 | Float64 |
2020-12-22 | 16.39 | 16.74 | 16.36 | 16.58 | 16.2664 | 6714400 | 0.379999 |
2020-12-23 | 16.9 | 16.93 | 16.57 | 16.59 | 16.2762 | 5913500 | 0.36 |
2020-12-24 | 16.68 | 16.69 | 16.52 | 16.6 | 16.286 | 1320600 | 0.170001 |
2020-12-28 | 16.73 | 16.84 | 16.72 | 16.77 | 16.4528 | 4239300 | 0.120001 |
2020-12-29 | 16.9 | 16.9 | 16.67 | 16.76 | 16.443 | 8473700 | 0.23 |
2020-12-30 | 16.87 | 17.0 | 16.83 | 16.93 | 16.6098 | 3877200 | 0.17 |
2020-12-31 | 17.01 | 17.03 | 16.89 | 16.95 | 16.6294 | 3693700 | 0.140002 |
2021-01-04 | 17.39 | 17.43 | 17.06 | 17.25 | 16.9237 | 12597600 | 0.370001 |
2021-01-05 | 17.32 | 17.67 | 17.32 | 17.65 | 17.3162 | 8109900 | 0.35 |
2021-01-06 | 17.4 | 17.79 | 17.34 | 17.73 | 17.3946 | 9136300 | 0.450001 |
2021-01-07 | 17.36 | 17.55 | 17.26 | 17.55 | 17.2181 | 10272000 | 0.289999 |
2021-01-08 | 18.07 | 18.61 | 18.02 | 18.59 | 18.2384 | 17802400 | 0.590001 |
2021-01-11 | 18.68 | 18.86 | 18.55 | 18.76 | 18.4052 | 12220600 | 0.310002 |
2021-01-12 | 18.92 | 18.94 | 18.54 | 18.6 | 18.2482 | 10629100 | 0.4 |
2021-01-13 | 19.03 | 19.07 | 18.4 | 18.43 | 18.0814 | 18409900 | 0.67 |
2021-01-14 | 18.57 | 18.65 | 18.14 | 18.22 | 17.8754 | 13286100 | 0.510001 |
2021-01-15 | 18.19 | 18.38 | 18.11 | 18.17 | 17.8263 | 7443000 | 0.269998 |
2021-01-19 | 18.08 | 18.18 | 17.95 | 18.12 | 17.7773 | 7179600 | 0.229999 |
2021-01-20 | 18.37 | 18.47 | 18.29 | 18.4 | 18.052 | 5408500 | 0.179998 |
2021-01-21 | 18.39 | 18.4 | 18.15 | 18.2 | 17.8558 | 7963400 | 0.25 |
2021-01-22 | 18.23 | 18.27 | 18.06 | 18.18 | 17.8361 | 5663500 | 0.210001 |
2021-01-25 | 18.15 | 18.22 | 17.84 | 17.92 | 17.5811 | 6012600 | 0.379999 |
2021-01-26 | 17.92 | 17.92 | 17.75 | 17.85 | 17.5124 | 5472600 | 0.17 |
2021-01-27 | 17.65 | 17.89 | 17.44 | 17.47 | 17.1396 | 11388300 | 0.449998 |
2021-01-28 | 17.46 | 17.75 | 17.41 | 17.64 | 17.3064 | 7877600 | 0.34 |
2021-01-29 | 17.16 | 17.23 | 16.88 | 16.88 | 16.5607 | 9671400 | 0.350001 |
2021-02-01 | 17.19 | 17.42 | 17.05 | 17.38 | 17.0513 | 5829200 | 0.370001 |
2021-02-02 | 17.45 | 17.51 | 17.34 | 17.44 | 17.1101 | 4119800 | 0.17 |
2021-02-03 | 17.6 | 17.75 | 17.49 | 17.65 | 17.3162 | 4677800 | 0.26 |
2021-02-04 | 17.54 | 17.64 | 17.36 | 17.59 | 17.2573 | 4439600 | 0.279998 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
This updated dataframe can be saved using CSV.write() function.
Reading and writing excel files
We’ll use the XLSX.jl package in Julia to read and write excel files.
Here’s how it can be done –
Date | Open | High | Low | Close | Adj Close | Volume |
---|---|---|---|---|---|---|
Any | Any | Any | Any | Any | Any | Any |
2020-12-22 | 16.39 | 16.74 | 16.36 | 16.58 | 16.2664 | 6714400 |
2020-12-23 | 16.9 | 16.93 | 16.57 | 16.59 | 16.2762 | 5913500 |
2020-12-24 | 16.68 | 16.69 | 16.52 | 16.6 | 16.286 | 1320600 |
2020-12-28 | 16.73 | 16.84 | 16.72 | 16.77 | 16.4528 | 4239300 |
2020-12-29 | 16.9 | 16.9 | 16.67 | 16.76 | 16.443 | 8473700 |
2020-12-30 | 16.87 | 17.0 | 16.83 | 16.93 | 16.6098 | 3877200 |
2020-12-31 | 17.01 | 17.03 | 16.89 | 16.95 | 16.6294 | 3693700 |
2021-01-04 | 17.39 | 17.43 | 17.06 | 17.25 | 16.9237 | 12597600 |
2021-01-05 | 17.32 | 17.67 | 17.32 | 17.65 | 17.3162 | 8109900 |
2021-01-06 | 17.4 | 17.79 | 17.34 | 17.73 | 17.3946 | 9136300 |
2021-01-07 | 17.36 | 17.55 | 17.26 | 17.55 | 17.2181 | 10272000 |
2021-01-08 | 18.07 | 18.61 | 18.02 | 18.59 | 18.2384 | 17802400 |
2021-01-11 | 18.68 | 18.86 | 18.55 | 18.76 | 18.4052 | 12220600 |
2021-01-12 | 18.92 | 18.94 | 18.54 | 18.6 | 18.2482 | 10629100 |
2021-01-13 | 19.03 | 19.07 | 18.4 | 18.43 | 18.0814 | 18409900 |
2021-01-14 | 18.57 | 18.65 | 18.14 | 18.22 | 17.8754 | 13286100 |
2021-01-15 | 18.19 | 18.38 | 18.11 | 18.17 | 17.8263 | 7443000 |
2021-01-19 | 18.08 | 18.18 | 17.95 | 18.12 | 17.7773 | 7179600 |
2021-01-20 | 18.37 | 18.47 | 18.29 | 18.4 | 18.052 | 5408500 |
2021-01-21 | 18.39 | 18.4 | 18.15 | 18.2 | 17.8558 | 7963400 |
2021-01-22 | 18.23 | 18.27 | 18.06 | 18.18 | 17.8361 | 5663500 |
2021-01-25 | 18.15 | 18.22 | 17.84 | 17.92 | 17.5811 | 6012600 |
2021-01-26 | 17.92 | 17.92 | 17.75 | 17.85 | 17.5124 | 5472600 |
2021-01-27 | 17.65 | 17.89 | 17.44 | 17.47 | 17.1396 | 11388300 |
2021-01-28 | 17.46 | 17.75 | 17.41 | 17.64 | 17.3064 | 7877600 |
2021-01-29 | 17.16 | 17.23 | 16.88 | 16.88 | 16.5607 | 9671400 |
2021-02-01 | 17.19 | 17.42 | 17.05 | 17.38 | 17.0513 | 5829200 |
2021-02-02 | 17.45 | 17.51 | 17.34 | 17.44 | 17.1101 | 4119800 |
2021-02-03 | 17.6 | 17.75 | 17.49 | 17.65 | 17.3162 | 4677800 |
2021-02-04 | 17.54 | 17.64 | 17.36 | 17.59 | 17.2573 | 4439600 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
We can write an excel file using the writetable() function.
Julia has in-built read() and write() open() close() functions to work with text files. More details can be found here.
Data can be written in .jld format as well. .jld is Julia’s data format built using the JLD.jl package.
Details for the following packages can be found here –
Stay tuned for the next installment, in which Anshul Tayal will present how to create scripts for data visualization.
Visit QuantInsti to read the full article: https://blog.quantinsti.com/data-manipulation-visualization-using-julia/.
Disclosure: Interactive Brokers
Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from QuantInsti and is being posted with its permission. The views expressed in this material are solely those of the author and/or QuantInsti and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.