{"id":202957,"date":"2024-03-01T11:11:00","date_gmt":"2024-03-01T16:11:00","guid":{"rendered":"https:\/\/ibkrcampus.com\/?p=202957"},"modified":"2024-03-04T05:41:28","modified_gmt":"2024-03-04T10:41:28","slug":"clean-transform-optimize-the-power-of-data-preprocessing-part-i","status":"publish","type":"post","link":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\/","title":{"rendered":"Clean, Transform, Optimize: The Power of Data Preprocessing &#8211; Part I"},"content":{"rendered":"\n<p>Data preprocessing is a basic requirement of any good machine learning model. Preprocessing the data implies using the data which is easily readable by the machine learning model.<\/p>\n\n\n\n<p>This essential phase involves identifying and rectifying errors, handling missing values, and transforming data to enhance its suitability for analysis. As the first crucial step in the data preparation journey, preprocessing ensures data accuracy and sets the stage for effective modelling. From scaling and encoding to feature engineering, this process unleashes the true potential of datasets, empowering analysts and data scientists to uncover patterns and optimise predictive models.<\/p>\n\n\n\n<p>Dive into the world of data preprocessing to unlock the full potential of your data. In this article, we will discuss the basics of data preprocessing and how to make the data suitable for machine learning models.<\/p>\n\n\n\n<p>This article covers:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What is data preprocessing?<\/li>\n\n\n\n<li>Why is data preprocessing required?<\/li>\n\n\n\n<li>Data that needs-data-preprocessing<\/li>\n\n\n\n<li>Data preprocessing with Python for different dataset types<\/li>\n\n\n\n<li>Data cleaning vs data preprocessing<\/li>\n\n\n\n<li>Data preparation vs data preprocessing<\/li>\n\n\n\n<li>Data preprocessing vs feature engineering<\/li>\n\n\n\n<li>Where can you learn more about data preprocessing?<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-is-data-preprocessing\">What is data preprocessing?<\/h2>\n\n\n\n<p>Our comprehensive blog on&nbsp;<a href=\"https:\/\/blog.quantinsti.com\/data-cleaning\/\">data cleaning<\/a>&nbsp;helps you learn all about data cleaning as a part of preprocessing the data, covering everything from the basics to performance, and more.<\/p>\n\n\n\n<p>After data cleaning, data preprocessing requires the data to be transformed into a format that is understandable to the machine learning model.<\/p>\n\n\n\n<p>Data preprocessing involves readying raw data to make it suitable for machine learning models. This process includes data cleaning, ensuring the data is prepared for input into machine learning models.<\/p>\n\n\n\n<p>Automated data preprocessing is particularly advantageous when dealing with large datasets, enhancing efficiency, and ensuring consistency in the preparation of data for further analysis or model training.<strong><a href=\"https:\/\/pdf.sciencedirectassets.com\/280203\/1-s2.0-S1877050919X00174\/1-s2.0-S1877050919318885\/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEJv%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJHMEUCIBZlyrBBEtYxRARvKTkAtEc168rtdO15JsoCxUxoqm1dAiEAvRhczaqlcNHHMyVe2mpDlRDJLCQRLn8Aw8pwcgbkEusqsgUIUxAFGgwwNTkwMDM1NDY4NjUiDMWDMbxsir8JCh6xuSqPBXh4I19QJRNnncunec8R0NE7JWEeuO5Rv2uM4FoswnKvF%2BQB40BjdxvlcUoQ4L0uuIULhq44fY%2FM%2FeOCyoP122TQlomxv6KWnLHxOxkkVWfTtNSbyRtHPySe4K%2BQXj174ZzZ19Mhw5Cp0%2FtNHo21zqCWTwmiSWILtBkVHL5OBboi%2FcVWbLuc8Z70o%2BjE6pNFIIKTZBDiDlaWGXihVrxYatigtMwVOBqsLg6%2FFjvxwGcn8DIkRHxVt5VUrwC20pIjQZYuZtQo%2BBEMlXFOQlYD1Fq8d92FW13GHODmrXT47K2NoAJw%2FypUR829gwqFS9BhUtWiD9pNnbHkxUjKUxTJd4Id8kxP5zJNb2KY2lIFACndie6pA1JRJV0rKzUlL%2FFcs5E%2FmbMVoowBI3t7vjZJ2k%2F%2F2alA01kDDTmrMnF%2F9fAbepfXrK0T1pXLFvykFxjCbFRCJvZPbNeVP2iqrlGXszMMBx6pffDHGPQAYmo8YgkD8hC57r5TYD9UkSZHY0ApLEQ6zFJNbsjKGw51DQfbkkPkhDdyx2WmetzQTzwqBPTD0Cmmekqd9xUZNY9MSqTNGfJBi21Eg%2BaysK16FKlD%2FY7cNqRxoli7nkdTtdJfLmGxqsWlyS2DiId%2FCGmQBF0PmVwghsSGQ4b1VPXA5Cmt5Lp8kj570A8HDmilKAHyiyI4DnNeL7QbjGCJ8dqGPSQducmBZnzaq9fw4mf5nX8R%2BUIw83xS%2BiF%2BLwOUbiWoQAQM388JEwPlkUdToVAdIv6SLsorSZRhNsregOwq%2FD6kobfkKejDRaEMd92BGjnjFuZsjz3OlxEG5cSDhN84JN3ivu4vG1Z1alP%2BIG8w0wE7eCf%2FFiSBwwgkdLoZyUgzYNEw2cC8rQY6sQF3XA7SJb7XVaK8FYR3WMuk2J%2B5R5Dumx95mzV7XpyaGdHBjJfwXH%2FMNpa0tz6%2BDgD3KWQSLOTa1h7cR4pw0KliawzkuJ8WlWc4OKeHvYjywRUx0CmQhP3heQJu5DoaB5JSNs4Zq5vu5f7QxQUa7tVQjg8kzFZUqj1xCgmRfl50%2BKLzMVJZ8rWAQkxM4NF1hm1nSqp3UAZCsau3AI9YfZoaDSWBvOqC3cjAHDH4GC9P%2Btc%3D&amp;X-Amz-Algorithm=AWS4-HMAC-SHA256&amp;X-Amz-Date=20240123T032141Z&amp;X-Amz-SignedHeaders=host&amp;X-Amz-Expires=300&amp;X-Amz-Credential=ASIAQ3PHCVTYSLWMCDX5%2F20240123%2Fus-east-1%2Fs3%2Faws4_request&amp;X-Amz-Signature=8f4c1e21e95ddc08e59aea6031ac50edd27014a7e5eb8e067b538e437e251cbc&amp;hash=c5b40040373ac92ddc21ed787a9a30a5b4273e35262810bf3040f0e3ca030e7e&amp;host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&amp;pii=S1877050919318885&amp;tid=spdf-56db2c4c-e4e1-4910-aa5a-41a56873048e&amp;sid=0e7a5bd73d6d39416468ec78d804c07a54b4gxrqb&amp;type=client&amp;tsoh=d3d3LnNjaWVuY2VkaXJlY3QuY29t&amp;ua=0b0d5d555f02035654&amp;rr=849ce8723b5454b6&amp;cc=in\">\u207d\u00b9\u207e<\/a><\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"why-is-data-preprocessing-required\">Why is data preprocessing required?<\/h2>\n\n\n\n<p>Here, we will discuss the importance of data preprocessing in machine learning. Data preprocessing is essential for the following reasons:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ensuring Accuracy:<\/strong>&nbsp;To render data readable for machine learning models, it must be devoid of missing, redundant, or duplicate values, ensuring accuracy.<\/li>\n\n\n\n<li><strong>Building Trust:<\/strong>&nbsp;The updated data should strive to be as accurate and trustworthy as possible, instilling confidence in its reliability.<\/li>\n\n\n\n<li><strong>Enhancing Interpretability:<\/strong>&nbsp;Preprocessed data needs to be correctly interpreted, promoting a better understanding of the information it conveys.<\/li>\n<\/ul>\n\n\n\n<p>In summary, data preprocessing is vital to enable machine learning models to learn from accurate and reliable data, ensuring their ability to make correct predictions or outcomes.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"data-that-needs-data-preprocessing\">Data that needs data preprocessing<\/h2>\n\n\n\n<p>Since data comes in various formats, there can be certain errors that need to be corrected. Let us discuss how different datasets can be converted into the correct format that the ML model can read accurately.<\/p>\n\n\n\n<p>Here, we will see how to feed correct features from datasets with:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Missing values &#8211;<\/strong>&nbsp;Incomplete or absent data points within a dataset that require handling through methods like imputation or deletion.<\/li>\n\n\n\n<li><strong>Outliers &#8211;&nbsp;<\/strong>Anomalies or extreme values in a dataset that can skew analysis or modelling results, often addressed through identification and removal techniques.<\/li>\n\n\n\n<li><strong>Overfitting &#8211;<\/strong>&nbsp;A modelling phenomenon where a machine learning algorithm learns the training data too well, capturing noise and hindering generalisation to new, unseen data.<\/li>\n\n\n\n<li><strong>Data with no numerical values &#8211;&nbsp;<\/strong>Non-numeric data, typically categorical or textual, necessitating encoding techniques like one-hot encoding for use in numerical-based models.<\/li>\n\n\n\n<li><strong>Different date format &#8211;<\/strong>&nbsp;Diverse representations of dates in a dataset, requiring standardisation or conversion to a uniform format for consistency in time-based analyses.<\/li>\n<\/ul>\n\n\n\n<p>This way, feeding the ML model with different data types helps with ensuring data quality in the preprocessing stage.<\/p>\n\n\n\n<p><em>Visit <a href=\"https:\/\/blog.quantinsti.com\/data-preprocessing\/\">QuantInsti<\/a> website for additional resources on this topic and to watch the video by Dr. Ernest Chan<\/em><\/p>\n\n\n\n<p><em>Originally posted on <a href=\"https:\/\/blog.quantinsti.com\/data-preprocessing\/\">QuantInsti<\/a> blog.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Let us discuss how different datasets can be converted into the correct format that the ML model can read accurately.<\/p>\n","protected":false},"author":186,"featured_media":132255,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[339,343,349,338,341],"tags":[16746,7617,806,852,595],"contributors-categories":[13654],"class_list":{"0":"post-202957","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science","8":"category-programing-languages","9":"category-python-development","10":"category-ibkr-quant-news","11":"category-quant-development","12":"tag-data-cleaning","13":"tag-data-preprocessing","14":"tag-data-science","15":"tag-machine-learning","16":"tag-python","17":"contributors-categories-quantinsti"},"pp_statuses_selecting_workflow":false,"pp_workflow_action":"current","pp_status_selection":"publish","acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v27.3) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Clean, Transform, Optimize: The Power of Data Preprocessing &#8211; Part I<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.interactivebrokers.com\/campus\/wp-json\/wp\/v2\/posts\/202957\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Clean, Transform, Optimize: The Power of Data Preprocessing - Part I | IBKR Campus US\" \/>\n<meta property=\"og:description\" content=\"Let us discuss how different datasets can be converted into the correct format that the ML model can read accurately.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\/\" \/>\n<meta property=\"og:site_name\" content=\"IBKR Campus US\" \/>\n<meta property=\"article:published_time\" content=\"2024-03-01T16:11:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-03-04T10:41:28+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2022\/04\/data-science-quant.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"563\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Contributor Author\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Contributor Author\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\n\t    \"@context\": \"https:\\\/\\\/schema.org\",\n\t    \"@graph\": [\n\t        {\n\t            \"@type\": \"NewsArticle\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\\\/#article\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\\\/\"\n\t            },\n\t            \"author\": {\n\t                \"name\": \"Contributor Author\",\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/person\\\/e823e46b42ca381080387e794318a485\"\n\t            },\n\t            \"headline\": \"Clean, Transform, Optimize: The Power of Data Preprocessing &#8211; Part I\",\n\t            \"datePublished\": \"2024-03-01T16:11:00+00:00\",\n\t            \"dateModified\": \"2024-03-04T10:41:28+00:00\",\n\t            \"mainEntityOfPage\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\\\/\"\n\t            },\n\t            \"wordCount\": 634,\n\t            \"commentCount\": 0,\n\t            \"publisher\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\\\/#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2022\\\/04\\\/data-science-quant.jpg\",\n\t            \"keywords\": [\n\t                \"Data Cleaning\",\n\t                \"Data Preprocessing\",\n\t                \"Data Science\",\n\t                \"Machine Learning\",\n\t                \"Python\"\n\t            ],\n\t            \"articleSection\": [\n\t                \"Data Science\",\n\t                \"Programming Languages\",\n\t                \"Python Development\",\n\t                \"Quant\",\n\t                \"Quant Development\"\n\t            ],\n\t            \"inLanguage\": \"en-US\",\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"CommentAction\",\n\t                    \"name\": \"Comment\",\n\t                    \"target\": [\n\t                        \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\\\/#respond\"\n\t                    ]\n\t                }\n\t            ]\n\t        },\n\t        {\n\t            \"@type\": \"WebPage\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\\\/\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\\\/\",\n\t            \"name\": \"Clean, Transform, Optimize: The Power of Data Preprocessing - Part I | IBKR Campus US\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#website\"\n\t            },\n\t            \"primaryImageOfPage\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\\\/#primaryimage\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\\\/#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2022\\\/04\\\/data-science-quant.jpg\",\n\t            \"datePublished\": \"2024-03-01T16:11:00+00:00\",\n\t            \"dateModified\": \"2024-03-04T10:41:28+00:00\",\n\t            \"inLanguage\": \"en-US\",\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"ReadAction\",\n\t                    \"target\": [\n\t                        \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\\\/\"\n\t                    ]\n\t                }\n\t            ]\n\t        },\n\t        {\n\t            \"@type\": \"ImageObject\",\n\t            \"inLanguage\": \"en-US\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\\\/#primaryimage\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2022\\\/04\\\/data-science-quant.jpg\",\n\t            \"contentUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2022\\\/04\\\/data-science-quant.jpg\",\n\t            \"width\": 1000,\n\t            \"height\": 563,\n\t            \"caption\": \"Data Science\"\n\t        },\n\t        {\n\t            \"@type\": \"WebSite\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#website\",\n\t            \"url\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/\",\n\t            \"name\": \"IBKR Campus US\",\n\t            \"description\": \"Financial Education from Interactive Brokers\",\n\t            \"publisher\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\"\n\t            },\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"SearchAction\",\n\t                    \"target\": {\n\t                        \"@type\": \"EntryPoint\",\n\t                        \"urlTemplate\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/?s={search_term_string}\"\n\t                    },\n\t                    \"query-input\": {\n\t                        \"@type\": \"PropertyValueSpecification\",\n\t                        \"valueRequired\": true,\n\t                        \"valueName\": \"search_term_string\"\n\t                    }\n\t                }\n\t            ],\n\t            \"inLanguage\": \"en-US\"\n\t        },\n\t        {\n\t            \"@type\": \"Organization\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\",\n\t            \"name\": \"Interactive Brokers\",\n\t            \"alternateName\": \"IBKR\",\n\t            \"url\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/\",\n\t            \"logo\": {\n\t                \"@type\": \"ImageObject\",\n\t                \"inLanguage\": \"en-US\",\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/logo\\\/image\\\/\",\n\t                \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/05\\\/ibkr-campus-logo.jpg\",\n\t                \"contentUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/05\\\/ibkr-campus-logo.jpg\",\n\t                \"width\": 669,\n\t                \"height\": 669,\n\t                \"caption\": \"Interactive Brokers\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/logo\\\/image\\\/\"\n\t            },\n\t            \"publishingPrinciples\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/about-ibkr-campus\\\/\",\n\t            \"ethicsPolicy\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/cyber-security-notice\\\/\"\n\t        },\n\t        {\n\t            \"@type\": \"Person\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/person\\\/e823e46b42ca381080387e794318a485\",\n\t            \"name\": \"Contributor Author\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/author\\\/contributor-author\\\/\"\n\t        }\n\t    ]\n\t}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Clean, Transform, Optimize: The Power of Data Preprocessing &#8211; Part I","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.interactivebrokers.com\/campus\/wp-json\/wp\/v2\/posts\/202957\/","og_locale":"en_US","og_type":"article","og_title":"Clean, Transform, Optimize: The Power of Data Preprocessing - Part I | IBKR Campus US","og_description":"Let us discuss how different datasets can be converted into the correct format that the ML model can read accurately.","og_url":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\/","og_site_name":"IBKR Campus US","article_published_time":"2024-03-01T16:11:00+00:00","article_modified_time":"2024-03-04T10:41:28+00:00","og_image":[{"width":1000,"height":563,"url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2022\/04\/data-science-quant.jpg","type":"image\/jpeg"}],"author":"Contributor Author","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Contributor Author","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\/#article","isPartOf":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\/"},"author":{"name":"Contributor Author","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/person\/e823e46b42ca381080387e794318a485"},"headline":"Clean, Transform, Optimize: The Power of Data Preprocessing &#8211; Part I","datePublished":"2024-03-01T16:11:00+00:00","dateModified":"2024-03-04T10:41:28+00:00","mainEntityOfPage":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\/"},"wordCount":634,"commentCount":0,"publisher":{"@id":"https:\/\/ibkrcampus.com\/campus\/#organization"},"image":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\/#primaryimage"},"thumbnailUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2022\/04\/data-science-quant.jpg","keywords":["Data Cleaning","Data Preprocessing","Data Science","Machine Learning","Python"],"articleSection":["Data Science","Programming Languages","Python Development","Quant","Quant Development"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\/","url":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\/","name":"Clean, Transform, Optimize: The Power of Data Preprocessing - Part I | IBKR Campus US","isPartOf":{"@id":"https:\/\/ibkrcampus.com\/campus\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\/#primaryimage"},"image":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\/#primaryimage"},"thumbnailUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2022\/04\/data-science-quant.jpg","datePublished":"2024-03-01T16:11:00+00:00","dateModified":"2024-03-04T10:41:28+00:00","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/clean-transform-optimize-the-power-of-data-preprocessing-part-i\/#primaryimage","url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2022\/04\/data-science-quant.jpg","contentUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2022\/04\/data-science-quant.jpg","width":1000,"height":563,"caption":"Data Science"},{"@type":"WebSite","@id":"https:\/\/ibkrcampus.com\/campus\/#website","url":"https:\/\/ibkrcampus.com\/campus\/","name":"IBKR Campus US","description":"Financial Education from Interactive Brokers","publisher":{"@id":"https:\/\/ibkrcampus.com\/campus\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ibkrcampus.com\/campus\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/ibkrcampus.com\/campus\/#organization","name":"Interactive Brokers","alternateName":"IBKR","url":"https:\/\/ibkrcampus.com\/campus\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/logo\/image\/","url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2024\/05\/ibkr-campus-logo.jpg","contentUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2024\/05\/ibkr-campus-logo.jpg","width":669,"height":669,"caption":"Interactive Brokers"},"image":{"@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/logo\/image\/"},"publishingPrinciples":"https:\/\/www.interactivebrokers.com\/campus\/about-ibkr-campus\/","ethicsPolicy":"https:\/\/www.interactivebrokers.com\/campus\/cyber-security-notice\/"},{"@type":"Person","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/person\/e823e46b42ca381080387e794318a485","name":"Contributor Author","url":"https:\/\/www.interactivebrokers.com\/campus\/author\/contributor-author\/"}]}},"jetpack_featured_media_url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2022\/04\/data-science-quant.jpg","_links":{"self":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts\/202957","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/users\/186"}],"replies":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/comments?post=202957"}],"version-history":[{"count":0,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts\/202957\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/media\/132255"}],"wp:attachment":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/media?parent=202957"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/categories?post=202957"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/tags?post=202957"},{"taxonomy":"contributors-categories","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/contributors-categories?post=202957"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}