{"id":96139,"date":"2021-08-27T11:30:00","date_gmt":"2021-08-27T15:30:00","guid":{"rendered":"https:\/\/ibkrcampus.com\/?p=96139"},"modified":"2022-11-21T09:47:57","modified_gmt":"2022-11-21T14:47:57","slug":"automated-eda-with-python","status":"publish","type":"post","link":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/automated-eda-with-python\/","title":{"rendered":"Automated EDA with Python"},"content":{"rendered":"\n<p>In this post, we will investigate the\u00a0<strong>pandas_profiling<\/strong>\u00a0and\u00a0<strong>sweetviz<\/strong>\u00a0packages, which can be used to speed up EDA (exploratory data analysis) with Python. In a previous article, we talked about an analogous package in R (<a href=\"https:\/\/theautomatic.net\/2021\/03\/03\/faster-data-exploration-with-dataexplorer\/\">see this link<\/a>).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-getting-started-with-pandas_profiling\"><strong>Getting started with pandas_profiling<\/strong><\/h2>\n\n\n\n<p><strong>pandas_profiling<\/strong>&nbsp;can be installed using pip, like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>pip install pandas-profiling&#91;notebook]<\/code><\/pre>\n\n\n\n<p>Next, let\u2019s read in our dataset. The data we\u2019ll be using is a heart attack-related dataset, which can be found&nbsp;<a href=\"https:\/\/www.kaggle.com\/rashikrahmanpritom\/heart-attack-analysis-prediction-dataset\">here<\/a>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pandas as pd\n \nheart_data = pd.read_csv(\"heart.csv\")\n \nheart_data.head()\n<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"639\" height=\"264\" data-src=\"\/campus\/wp-content\/uploads\/sites\/2\/2021\/08\/python-pandas_profiling-automatic-net.png\" alt=\"\" class=\"wp-image-98964 lazyload\" data-srcset=\"https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2021\/08\/python-pandas_profiling-automatic-net.png 639w, https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2021\/08\/python-pandas_profiling-automatic-net-300x124.png 300w\" data-sizes=\"(max-width: 639px) 100vw, 639px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 639px; aspect-ratio: 639\/264;\" \/><\/figure>\n\n\n\n<p>Now, let\u2019s import&nbsp;<em>ProfileReport<\/em>&nbsp;from&nbsp;<strong>pandas_profiling<\/strong>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from pandas_profiling import ProfileReport\n \nreport = ProfileReport(heart_data, title = \"Sample Report\")\n \nreport<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"634\" height=\"92\" data-src=\"\/campus\/wp-content\/uploads\/sites\/2\/2021\/08\/python-eda-automatic-net.png\" alt=\"\" class=\"wp-image-98968 lazyload\" data-srcset=\"https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2021\/08\/python-eda-automatic-net.png 634w, https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2021\/08\/python-eda-automatic-net-300x44.png 300w\" data-sizes=\"(max-width: 634px) 100vw, 634px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 634px; aspect-ratio: 634\/92;\" \/><\/figure>\n\n\n\n<p>If you\u2019re running this code in Jupyter Notebook, you should see the report generated within your notebook file.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"640\" height=\"342\" data-src=\"\/campus\/wp-content\/uploads\/sites\/2\/2021\/08\/python-data-analysis-automatic-net.png\" alt=\"\" class=\"wp-image-98971 lazyload\" data-srcset=\"https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2021\/08\/python-data-analysis-automatic-net.png 640w, https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2021\/08\/python-data-analysis-automatic-net-300x160.png 300w\" data-sizes=\"(max-width: 640px) 100vw, 640px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 640px; aspect-ratio: 640\/342;\" \/><\/figure>\n\n\n\n<p>The report shows several pieces of analysis. First, it gives a summary glimpse of the data, giving the number of variables, observations, missing values and percentages, data type information, and number of duplicate rows (if any).<\/p>\n\n\n\n<p>Additionally, there are several other tabs available in the report. For example, the package automatically generates correlation heatmaps, like below. Also, it\u2019s possible to see how many missing values the dataset has by column.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"640\" height=\"509\" data-src=\"\/campus\/wp-content\/uploads\/sites\/2\/2021\/08\/python-correlation-heatmap-automatic-net.png\" alt=\"\" class=\"wp-image-98975 lazyload\" data-srcset=\"https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2021\/08\/python-correlation-heatmap-automatic-net.png 640w, https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2021\/08\/python-correlation-heatmap-automatic-net-300x239.png 300w\" data-sizes=\"(max-width: 640px) 100vw, 640px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 640px; aspect-ratio: 640\/509;\" \/><\/figure>\n\n\n\n<p>The variables tab shows the number of distinct values, missing values, and plots a histogram or barplot for the variable (for numeric or categorical variables, respectively).<\/p>\n\n\n\n<p>If you want to save the report to an external HTML file, you can do that by using the&nbsp;<em>to_file<\/em>&nbsp;method.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># save report to file\nreport.to_html(\"report_file.html\")<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Handling larger datasets<\/strong><\/h2>\n\n\n\n<p>Handling larger datasets is more of a challenge using the default settings, but can be done by making some adjustments. The&nbsp;<em>ProfileReport<\/em>&nbsp;class comes with a parameter called&nbsp;<em>minimal<\/em>, which we can set to True to reduce the number of computations that will be made.<\/p>\n\n\n\n<p>This setting will generate a report showing the Overview and Variables tabs, so you can still see a visualization of each variable, missing value analysis, and a few summary stats.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>min_report = ProfileReport(heart_data, title = \"Minimal Report\", minimal = True)\n \nmin_report<\/code><\/pre>\n\n\n\n<p>If you want to run the full profiling on a large dataset, it may be better to start with a sample to avoid computational issues.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The sweetviz library<\/strong><\/h2>\n\n\n\n<p>An alternative to&nbsp;<strong>pandas_profiling<\/strong>&nbsp;is the&nbsp;<strong>sweetviz<\/strong>, which can also generate an automated EDA report. We can install&nbsp;<strong>sweetviz<\/strong>&nbsp;using pip:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>pip install sweetviz<\/code><\/pre>\n\n\n\n<p>Similar to&nbsp;<strong>pandas_profiling<\/strong>, you can generate an EDA report using a short code snippet:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import sweetviz as sv\n \n# generate report\nsweet_report = sv.analyze(heart_data)\n \n# show the output\nsv.show_notebook()<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"639\" height=\"393\" data-src=\"\/campus\/wp-content\/uploads\/sites\/2\/2021\/08\/sweetviz-python-automatic-net.png\" alt=\"\" class=\"wp-image-98986 lazyload\" data-srcset=\"https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2021\/08\/sweetviz-python-automatic-net.png 639w, https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2021\/08\/sweetviz-python-automatic-net-300x185.png 300w\" data-sizes=\"(max-width: 639px) 100vw, 639px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 639px; aspect-ratio: 639\/393;\" \/><\/figure>\n\n\n\n<p>You can click on a tab for any of the variables to expand the analysis done on any variable. This analysis shows descriptive statistics, a histogram (or barplot) of the variable\u2019s distribution, and information around the most frequently occurring values. Additionally,&nbsp;<strong>sweetviz<\/strong>&nbsp;generates a heatmap showing the associations between the variables in the dataset. Between continuous variables, this heatmap shows the Pearson correlation coefficient. For continuous and categorical features, the correlation ratio is shown. The uncertainty coefficient is shown for categorical-to-categorical associations.<\/p>\n\n\n\n<p>The&nbsp;<em>show_notebook<\/em>&nbsp;method above generates the report within Jupyter Notebook. If you want to create an external HTML file, you can use&nbsp;<em>show_html<\/em>, like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sv.show_html()<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Comparing datasets with sweetviz<\/strong><\/h2>\n\n\n\n<p><strong>Sweetviz<\/strong>&nbsp;can also be used to compare two datasets. For example, if you want to compare training and validation datasets, you could do that with&nbsp;<strong>sweetviz<\/strong>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>compare_report = sv.compare(&#91;train_data, \"Train\"], &#91;val_data, \"Test\"], \"output\")\n \n# show output within notebook file\ncompare_report.show_notebook()<\/code><\/pre>\n\n\n\n<p>This report will be similar to the one above, except it will break out the analysis by each dataset.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>That\u2019s all for now! If you enjoyed this post, please share it with your friends. To learn more about the packages discussed in this post, check out these links:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/pypi.org\/project\/pandas-profiling\/\">https:\/\/pypi.org\/project\/pandas-profiling\/<\/a><\/li><li><a href=\"https:\/\/pypi.org\/project\/sweetviz\/\">https:\/\/pypi.org\/project\/sweetviz\/<\/a><\/li><\/ul>\n\n\n\n<p><em>Visit TheAutomatic.net for additional insight on this topic: <a href=\"https:\/\/theautomatic.net\/2021\/07\/02\/automated-eda-with-python\/\">https:\/\/theautomatic.net\/2021\/07\/02\/automated-eda-with-python\/<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post, we will investigate the\u00a0pandas_profiling\u00a0and\u00a0sweetviz\u00a0packages, which can be used to speed up EDA (exploratory data analysis) with Python. <\/p>\n","protected":false},"author":388,"featured_media":47736,"comment_status":"closed","ping_status":"open","sticky":true,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[339,343,338,341,352,344,342],"tags":[806,6901,6650,10082,595,10083],"contributors-categories":[13695],"class_list":{"0":"post-96139","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science","8":"category-programing-languages","9":"category-ibkr-quant-news","10":"category-quant-development","11":"category-quant-north-america","12":"category-quant-regions","13":"category-r-development","14":"tag-data-science","15":"tag-exploratory-data-analysis","16":"tag-histogram","17":"tag-pandas_profiling","18":"tag-python","19":"tag-sweetviz","20":"contributors-categories-theautomatic-net"},"pp_statuses_selecting_workflow":false,"pp_workflow_action":"current","pp_status_selection":"publish","acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v27.3) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Automated EDA with Python | IBKR Quant<\/title>\n<meta name=\"description\" content=\"In this post, we will investigate the\u00a0pandas_profiling\u00a0and\u00a0sweetviz\u00a0packages, which can be used to speed up EDA (exploratory data analysis) with...\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.interactivebrokers.com\/campus\/wp-json\/wp\/v2\/posts\/96139\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Automated EDA with Python | IBKR Quant Blog\" \/>\n<meta property=\"og:description\" content=\"In this post, we will investigate the\u00a0pandas_profiling\u00a0and\u00a0sweetviz\u00a0packages, which can be used to speed up EDA (exploratory data analysis) with Python.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/automated-eda-with-python\/\" \/>\n<meta property=\"og:site_name\" content=\"IBKR Campus US\" \/>\n<meta property=\"article:published_time\" content=\"2021-08-27T15:30:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-11-21T14:47:57+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/06\/computer-platform.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"900\" \/>\n\t<meta property=\"og:image:height\" content=\"550\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Andrew Treadway\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Andrew Treadway\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\n\t    \"@context\": \"https:\\\/\\\/schema.org\",\n\t    \"@graph\": [\n\t        {\n\t            \"@type\": \"NewsArticle\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/automated-eda-with-python\\\/#article\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/automated-eda-with-python\\\/\"\n\t            },\n\t            \"author\": {\n\t                \"name\": \"Andrew Treadway\",\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/person\\\/d4018570a16fb867f1c08412fc9c64bc\"\n\t            },\n\t            \"headline\": \"Automated EDA with Python\",\n\t            \"datePublished\": \"2021-08-27T15:30:00+00:00\",\n\t            \"dateModified\": \"2022-11-21T14:47:57+00:00\",\n\t            \"mainEntityOfPage\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/automated-eda-with-python\\\/\"\n\t            },\n\t            \"wordCount\": 606,\n\t            \"publisher\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/automated-eda-with-python\\\/#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2020\\\/06\\\/computer-platform.jpg\",\n\t            \"keywords\": [\n\t                \"Data Science\",\n\t                \"Exploratory Data Analysis\",\n\t                \"Histogram\",\n\t                \"pandas_profiling\",\n\t                \"Python\",\n\t                \"sweetviz\"\n\t            ],\n\t            \"articleSection\": [\n\t                \"Data Science\",\n\t                \"Programming Languages\",\n\t                \"Quant\",\n\t                \"Quant Development\",\n\t                \"Quant North America\",\n\t                \"Quant Regions\",\n\t                \"R Development\"\n\t            ],\n\t            \"inLanguage\": \"en-US\"\n\t        },\n\t        {\n\t            \"@type\": \"WebPage\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/automated-eda-with-python\\\/\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/automated-eda-with-python\\\/\",\n\t            \"name\": \"Automated EDA with Python | IBKR Quant Blog\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#website\"\n\t            },\n\t            \"primaryImageOfPage\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/automated-eda-with-python\\\/#primaryimage\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/automated-eda-with-python\\\/#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2020\\\/06\\\/computer-platform.jpg\",\n\t            \"datePublished\": \"2021-08-27T15:30:00+00:00\",\n\t            \"dateModified\": \"2022-11-21T14:47:57+00:00\",\n\t            \"description\": \"In this post, we will investigate the\u00a0pandas_profiling\u00a0and\u00a0sweetviz\u00a0packages, which can be used to speed up EDA (exploratory data analysis) with Python.\",\n\t            \"inLanguage\": \"en-US\",\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"ReadAction\",\n\t                    \"target\": [\n\t                        \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/automated-eda-with-python\\\/\"\n\t                    ]\n\t                }\n\t            ]\n\t        },\n\t        {\n\t            \"@type\": \"ImageObject\",\n\t            \"inLanguage\": \"en-US\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/automated-eda-with-python\\\/#primaryimage\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2020\\\/06\\\/computer-platform.jpg\",\n\t            \"contentUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2020\\\/06\\\/computer-platform.jpg\",\n\t            \"width\": 900,\n\t            \"height\": 550,\n\t            \"caption\": \"Quant\"\n\t        },\n\t        {\n\t            \"@type\": \"WebSite\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#website\",\n\t            \"url\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/\",\n\t            \"name\": \"IBKR Campus US\",\n\t            \"description\": \"Financial Education from Interactive Brokers\",\n\t            \"publisher\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\"\n\t            },\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"SearchAction\",\n\t                    \"target\": {\n\t                        \"@type\": \"EntryPoint\",\n\t                        \"urlTemplate\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/?s={search_term_string}\"\n\t                    },\n\t                    \"query-input\": {\n\t                        \"@type\": \"PropertyValueSpecification\",\n\t                        \"valueRequired\": true,\n\t                        \"valueName\": \"search_term_string\"\n\t                    }\n\t                }\n\t            ],\n\t            \"inLanguage\": \"en-US\"\n\t        },\n\t        {\n\t            \"@type\": \"Organization\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\",\n\t            \"name\": \"Interactive Brokers\",\n\t            \"alternateName\": \"IBKR\",\n\t            \"url\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/\",\n\t            \"logo\": {\n\t                \"@type\": \"ImageObject\",\n\t                \"inLanguage\": \"en-US\",\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/logo\\\/image\\\/\",\n\t                \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/05\\\/ibkr-campus-logo.jpg\",\n\t                \"contentUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/05\\\/ibkr-campus-logo.jpg\",\n\t                \"width\": 669,\n\t                \"height\": 669,\n\t                \"caption\": \"Interactive Brokers\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/logo\\\/image\\\/\"\n\t            },\n\t            \"publishingPrinciples\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/about-ibkr-campus\\\/\",\n\t            \"ethicsPolicy\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/cyber-security-notice\\\/\"\n\t        },\n\t        {\n\t            \"@type\": \"Person\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/person\\\/d4018570a16fb867f1c08412fc9c64bc\",\n\t            \"name\": \"Andrew Treadway\",\n\t            \"description\": \"Andrew Treadway currently works as a Senior Data Scientist, and has experience doing analytics, software automation, and ETL. He completed a master\u2019s degree in computer science \\\/ machine learning, and an undergraduate degree in pure mathematics. Connect with him on LinkedIn: https:\\\/\\\/www.linkedin.com\\\/in\\\/andrew-treadway-a3b19b103\\\/In addition to TheAutomatic.net blog, he also teaches in-person courses on Python and R through my NYC meetup: more details.\",\n\t            \"sameAs\": [\n\t                \"https:\\\/\\\/theautomatic.net\\\/about-me\\\/\"\n\t            ],\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/author\\\/andrewtreadway\\\/\"\n\t        }\n\t    ]\n\t}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Automated EDA with Python | IBKR Quant","description":"In this post, we will investigate the\u00a0pandas_profiling\u00a0and\u00a0sweetviz\u00a0packages, which can be used to speed up EDA (exploratory data analysis) with...","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.interactivebrokers.com\/campus\/wp-json\/wp\/v2\/posts\/96139\/","og_locale":"en_US","og_type":"article","og_title":"Automated EDA with Python | IBKR Quant Blog","og_description":"In this post, we will investigate the\u00a0pandas_profiling\u00a0and\u00a0sweetviz\u00a0packages, which can be used to speed up EDA (exploratory data analysis) with Python.","og_url":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/automated-eda-with-python\/","og_site_name":"IBKR Campus US","article_published_time":"2021-08-27T15:30:00+00:00","article_modified_time":"2022-11-21T14:47:57+00:00","og_image":[{"width":900,"height":550,"url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/06\/computer-platform.jpg","type":"image\/jpeg"}],"author":"Andrew Treadway","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Andrew Treadway","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/automated-eda-with-python\/#article","isPartOf":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/automated-eda-with-python\/"},"author":{"name":"Andrew Treadway","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/person\/d4018570a16fb867f1c08412fc9c64bc"},"headline":"Automated EDA with Python","datePublished":"2021-08-27T15:30:00+00:00","dateModified":"2022-11-21T14:47:57+00:00","mainEntityOfPage":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/automated-eda-with-python\/"},"wordCount":606,"publisher":{"@id":"https:\/\/ibkrcampus.com\/campus\/#organization"},"image":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/automated-eda-with-python\/#primaryimage"},"thumbnailUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/06\/computer-platform.jpg","keywords":["Data Science","Exploratory Data Analysis","Histogram","pandas_profiling","Python","sweetviz"],"articleSection":["Data Science","Programming Languages","Quant","Quant Development","Quant North America","Quant Regions","R Development"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/automated-eda-with-python\/","url":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/automated-eda-with-python\/","name":"Automated EDA with Python | IBKR Quant Blog","isPartOf":{"@id":"https:\/\/ibkrcampus.com\/campus\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/automated-eda-with-python\/#primaryimage"},"image":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/automated-eda-with-python\/#primaryimage"},"thumbnailUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/06\/computer-platform.jpg","datePublished":"2021-08-27T15:30:00+00:00","dateModified":"2022-11-21T14:47:57+00:00","description":"In this post, we will investigate the\u00a0pandas_profiling\u00a0and\u00a0sweetviz\u00a0packages, which can be used to speed up EDA (exploratory data analysis) with Python.","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/automated-eda-with-python\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/automated-eda-with-python\/#primaryimage","url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/06\/computer-platform.jpg","contentUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/06\/computer-platform.jpg","width":900,"height":550,"caption":"Quant"},{"@type":"WebSite","@id":"https:\/\/ibkrcampus.com\/campus\/#website","url":"https:\/\/ibkrcampus.com\/campus\/","name":"IBKR Campus US","description":"Financial Education from Interactive Brokers","publisher":{"@id":"https:\/\/ibkrcampus.com\/campus\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ibkrcampus.com\/campus\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/ibkrcampus.com\/campus\/#organization","name":"Interactive Brokers","alternateName":"IBKR","url":"https:\/\/ibkrcampus.com\/campus\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/logo\/image\/","url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2024\/05\/ibkr-campus-logo.jpg","contentUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2024\/05\/ibkr-campus-logo.jpg","width":669,"height":669,"caption":"Interactive Brokers"},"image":{"@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/logo\/image\/"},"publishingPrinciples":"https:\/\/www.interactivebrokers.com\/campus\/about-ibkr-campus\/","ethicsPolicy":"https:\/\/www.interactivebrokers.com\/campus\/cyber-security-notice\/"},{"@type":"Person","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/person\/d4018570a16fb867f1c08412fc9c64bc","name":"Andrew Treadway","description":"Andrew Treadway currently works as a Senior Data Scientist, and has experience doing analytics, software automation, and ETL. He completed a master\u2019s degree in computer science \/ machine learning, and an undergraduate degree in pure mathematics. Connect with him on LinkedIn: https:\/\/www.linkedin.com\/in\/andrew-treadway-a3b19b103\/In addition to TheAutomatic.net blog, he also teaches in-person courses on Python and R through my NYC meetup: more details.","sameAs":["https:\/\/theautomatic.net\/about-me\/"],"url":"https:\/\/www.interactivebrokers.com\/campus\/author\/andrewtreadway\/"}]}},"jetpack_featured_media_url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/06\/computer-platform.jpg","_links":{"self":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts\/96139","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/users\/388"}],"replies":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/comments?post=96139"}],"version-history":[{"count":0,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts\/96139\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/media\/47736"}],"wp:attachment":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/media?parent=96139"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/categories?post=96139"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/tags?post=96139"},{"taxonomy":"contributors-categories","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/contributors-categories?post=96139"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}