Faster Data Exploration with DataExplorer

Data exploration is an important part of the modeling process. It can also take up a fair amount of time. The awesome DataExplorer package in R aims to make this process easier. To get started with DataExplorer, you’ll need to install it like below:

install.packages("DataExplorer")

Let’s use DataExplorer to explore a dataset on diabetes.

# load DataExplorer
library(DataExplorer)
 
# read in dataset
diabetes_data <- read.csv("https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv", header = FALSE)
 
# fix column names
names(diabetes_data) <- c("number_of_times_pregnant", "plasma_glucose_conc", "diastolic_bp", "triceps_skinfold_thickness", "two_hr_serum_insulin", "bmi", "diabetes_pedigree_function", "age", "label")
 
# create report
create_report(diabetes_data)

Running the create_report line of code above will generate an HTML report file containing a collection of useful information about the data. This includes:

Basic statistics, such as number of rows and columns, number of columns with missing data, count of continuous variables vs. discrete, and the total memory allocation
Data type for each field
Missing data percentages for each column
Univariate distribution for each column
QQ plots
Correlation analysis
PCA

That’s right – a single line of code can generate all of the above for a given dataset! It’s also possible to get each of these pieces individually. For example, in a single line of code, we can generate histograms for all the numeric variables in the dataset.

plot_histogram(diabetes_data)

Faster Data Exploration with DataExplorer

Similarly, we can get bar plots for all categorical variables in the dataset

plot_bar(diabetes_data)

Here’s an example getting the correlation plot:

plot_correlation(diabetes_data)

Configuring the report

It’s also possible to make adjustments to the output generated by create_report. For example, if you don’t want the QQ plots, you could set add_plot_qq = FALSE

config <- configure_report(add_plot_qq = FALSE)
 
create_report(config = config)

One hot encoding

DataExplorer also comes with a function to perform one hot encoding. You can one hot encode all the categorical variables in the dataset by passing the data frame name to the dummify function. In this case, we don’t have any categorical variables to encode, so the function will generate a warning.

dummify(diabetes_data)

Visit TheAutomatic.net blog for additional insight on this topic and to find DataExplorer scripts and documentation.

Join The Conversation

For specific platform feedback and suggestions, please submit it directly to our team using these instructions.

If you have an account-specific question or concern, please reach out to Client Services.

We encourage you to look through our FAQs before posting. Your question may already be covered!

Visit IBKR.com Open an IBKR Account

Disclosure: Interactive Brokers Third Party

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from TheAutomatic.net and is being posted with its permission. The views expressed in this material are solely those of the author and/or TheAutomatic.net and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

How much could you save on your margin loan by switching to Interactive Brokers?

Fill out the information below to see your estimated savings.

Current Interest Rate

Balance

USD

Margin Amount Borrowed

USD

Time Margin is Borrowed

IBKR will assess a surcharge of 1% on large loan balances unless otherwise prearranged with IBKR. The 1% surcharge would apply to all balances in the highest tier.

The interest calculator is based on information that we believe to be accurate and correct, but neither Interactive Brokers LLC nor its affiliates warrant its accuracy or adequacy and it should not be relied upon as such. Neither IBKR nor its affiliates are responsible for any errors or omissions or for results obtained from the use of this calculator.

Restrictions apply. Annual Percentage Rate (APR) on USD margin loan balances for IBKR Pro as of October 3, 2024. Interactive Brokers calculates the interest charged on margin loans using the applicable rates for each interest rate tier listed on its website. Learn more about margin loan rates.

The projections or other information generated by the Interest Calculator tool are hypothetical in nature, do not reflect actual results and are not guarantees of future results. Please note that results may vary with use of the tool over time.

Trading on margin is only for experienced investors with high risk tolerance. You may lose more than your initial investment. For additional information about rates on margin loans, please see Margin Loan Rates.

Faster Data Exploration with DataExplorer

One hot encoding

Join The Conversation

Disclosure: Interactive Brokers Third Party

Information on Other Interactive Brokers Affiliates

Interactive Brokers Canada Inc.

Interactive Brokers Australia Pty. Ltd.

Interactive Brokers Hong Kong Limited

Interactive Brokers India Pvt. Ltd.

Interactive Brokers Securities Japan Inc.

Interactive Brokers Singapore Pte. Ltd.

One hot encoding

Join The Conversation

Disclosure: Interactive Brokers Third Party

Bi-Weekly Newsletter

Daily Newsletter

Weekly Newsletter

Weekly Newsletter

Monthly Newsletter