{"id":194377,"date":"2023-08-04T10:00:00","date_gmt":"2023-08-04T14:00:00","guid":{"rendered":"https:\/\/ibkrcampus.com\/?p=194377"},"modified":"2023-08-04T10:00:08","modified_gmt":"2023-08-04T14:00:08","slug":"getting-data-from-pdfs-the-easy-way-with-r","status":"publish","type":"post","link":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/getting-data-from-pdfs-the-easy-way-with-r\/","title":{"rendered":"Getting Data from PDFs the Easy Way with R"},"content":{"rendered":"\n<p><em>Originally posted on <a href=\"https:\/\/theautomatic.net\/2018\/08\/24\/getting-data-from-pdfs-the-easy-way-with-r\/\">TheAutomatic.net<\/a>.<\/em><\/p>\n\n\n\n<p><em>Excerpt<\/em><\/p>\n\n\n\n<p>If you don\u2019t have&nbsp;<strong>tabulizer<\/strong>&nbsp;installed, just run&nbsp;<strong>install.packages(\u201ctabulizer\u201d)<\/strong>&nbsp;to get started.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-initial-setup\"><strong>Initial Setup<\/strong><\/h2>\n\n\n\n<p>After you have&nbsp;<strong>tabulizer<\/strong>&nbsp;installed, we\u2019ll load it, and define a variable referencing an example PDF.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">library(tabulizer)\n \nsite &lt;- \"https:\/\/www.sedl.org\/afterschool\/toolkits\/science\/pdf\/ast_sci_data_tables_sample.pdf\"<\/pre>\n\n\n\n<p>The PDFs you manipulate with this package don\u2019t have to be located on your machine \u2014 you can use&nbsp;<strong>tabulizer<\/strong>&nbsp;to reference a PDF by a URL. For our first example, we\u2019re going to use a sample PDF file found here:&nbsp;<a href=\"https:\/\/www.sedl.org\/afterschool\/toolkits\/science\/pdf\/ast_sci_data_tables_sample.pdf\">https:\/\/www.sedl.org\/afterschool\/toolkits\/science\/pdf\/ast_sci_data_tables_sample.pdf<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How to extract all the tables from a PDF<\/strong><\/h2>\n\n\n\n<p>You can extract tables from this PDF using the aptly-named&nbsp;<em>extract_tables<\/em>&nbsp;function, like this:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># default call with no parameters changed\nmatrix_results &lt;- extract_tables(site)\n \n# get back the tables as data frames, keeping their headers\ndf_results &lt;- extract_tables(site, output = \"data.frame\", header = TRUE)<\/pre>\n\n\n\n<p>By default, this function will return a matrix for each table, as in the first line of code above. However, as in the second line, we can add parameters to the function to specify the output flag to be&nbsp;<strong>data.frame<\/strong>, and set header =&nbsp;<strong>TRUE<\/strong>, to get back a list of data frames corresponding to the tables in the PDF.<\/p>\n\n\n\n<p>Once we have the results back, we can refer to any individual PDF table like any data frame we normally would in R.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">first_df &lt;- df_results[[1]]\n \nfirst_df$Number.of.Coils<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How to scrape text from a PDF<\/strong><\/h2>\n\n\n\n<p>Scraping text from our sample PDF can be done using&nbsp;<em>extract_text<\/em>:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">text &lt;- extract_text(site)\n \n# print text\ncat(text)<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How to split up a PDF by its pages<\/strong><\/h2>\n\n\n\n<p><strong>tabulizer<\/strong>&nbsp;can also create separate files for the pages in a PDF. This can be done using the&nbsp;<em>split_pdf<\/em>&nbsp;function:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># split PDF referenced above\n# output separate page files to current directory\nsplit_pdf(site, getwd())\n \n# or output to different directory\nsplit_pdf(site, \"C:\/path\/to\/other\/folder\")<\/pre>\n\n\n\n<p>The first argument of&nbsp;<em>split_pdf<\/em>&nbsp;is the filename or URL of your PDF; the second argument is the directory where you want the individual pages to be output.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How to merge a collection of PDFs<\/strong><\/h2>\n\n\n\n<p>What if we want to reverse what we just did? We can use the&nbsp;<em>merge_pdfs<\/em>&nbsp;function, which takes as input a vector of file names and and the name of the output file which will be the result of merging the files together.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">merge_pdfs(\"C:\/path\/to\/pdf\/files\", \"C:\/path\/to\/merged_result.pdf\")<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How to get the number of pages in a PDF<\/strong><\/h2>\n\n\n\n<p>Getting the number of pages in a PDF is made easy with the&nbsp;<em>get_n_pages<\/em>&nbsp;function, which you can call like this:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">get_n_pages(site)<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How to get metadata associated with a PDF<\/strong><\/h2>\n\n\n\n<p>You can get metadata associated with our PDF using&nbsp;<em>extract_metadata<\/em>:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">extract_metadata(site)<\/pre>\n\n\n\n<p>This function returns a list containing information showing the number of pages, title, created \/ modified dates, and more.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>After you have\u00a0tabulizer\u00a0installed, we\u2019ll load it, and define a variable referencing an example PDF.<\/p>\n","protected":false},"author":388,"featured_media":194443,"comment_status":"open","ping_status":"closed","sticky":true,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[343,338,341,342],"tags":[806,487,6591,15684],"contributors-categories":[13695],"class_list":{"0":"post-194377","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-programing-languages","8":"category-ibkr-quant-news","9":"category-quant-development","10":"category-r-development","11":"tag-data-science","12":"tag-r","13":"tag-rstats","14":"tag-tabulizer-package","15":"contributors-categories-theautomatic-net"},"pp_statuses_selecting_workflow":false,"pp_workflow_action":"current","pp_status_selection":"publish","acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v27.3) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Getting Data from PDFs the Easy Way with R | IBKR Quant<\/title>\n<meta name=\"description\" content=\"After you have\u00a0tabulizer\u00a0installed, we\u2019ll load it, and define a variable referencing an example PDF.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.interactivebrokers.com\/campus\/wp-json\/wp\/v2\/posts\/194377\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Getting Data from PDFs the Easy Way with R | IBKR Campus US\" \/>\n<meta property=\"og:description\" content=\"After you have\u00a0tabulizer\u00a0installed, we\u2019ll load it, and define a variable referencing an example PDF.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/getting-data-from-pdfs-the-easy-way-with-r\/\" \/>\n<meta property=\"og:site_name\" content=\"IBKR Campus US\" \/>\n<meta property=\"article:published_time\" content=\"2023-08-04T14:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-08-04T14:00:08+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/08\/r-programming-coffee-laptop-desk.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"563\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Andrew Treadway\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Andrew Treadway\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\n\t    \"@context\": \"https:\\\/\\\/schema.org\",\n\t    \"@graph\": [\n\t        {\n\t            \"@type\": \"NewsArticle\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/getting-data-from-pdfs-the-easy-way-with-r\\\/#article\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/getting-data-from-pdfs-the-easy-way-with-r\\\/\"\n\t            },\n\t            \"author\": {\n\t                \"name\": \"Andrew Treadway\",\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/person\\\/d4018570a16fb867f1c08412fc9c64bc\"\n\t            },\n\t            \"headline\": \"Getting Data from PDFs the Easy Way with R\",\n\t            \"datePublished\": \"2023-08-04T14:00:00+00:00\",\n\t            \"dateModified\": \"2023-08-04T14:00:08+00:00\",\n\t            \"mainEntityOfPage\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/getting-data-from-pdfs-the-easy-way-with-r\\\/\"\n\t            },\n\t            \"wordCount\": 435,\n\t            \"commentCount\": 0,\n\t            \"publisher\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/getting-data-from-pdfs-the-easy-way-with-r\\\/#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2023\\\/08\\\/r-programming-coffee-laptop-desk.jpg\",\n\t            \"keywords\": [\n\t                \"Data Science\",\n\t                \"R\",\n\t                \"rstats\",\n\t                \"tabulizer package\"\n\t            ],\n\t            \"articleSection\": [\n\t                \"Programming Languages\",\n\t                \"Quant\",\n\t                \"Quant Development\",\n\t                \"R Development\"\n\t            ],\n\t            \"inLanguage\": \"en-US\",\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"CommentAction\",\n\t                    \"name\": \"Comment\",\n\t                    \"target\": [\n\t                        \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/getting-data-from-pdfs-the-easy-way-with-r\\\/#respond\"\n\t                    ]\n\t                }\n\t            ]\n\t        },\n\t        {\n\t            \"@type\": \"WebPage\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/getting-data-from-pdfs-the-easy-way-with-r\\\/\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/getting-data-from-pdfs-the-easy-way-with-r\\\/\",\n\t            \"name\": \"Getting Data from PDFs the Easy Way with R | IBKR Campus US\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#website\"\n\t            },\n\t            \"primaryImageOfPage\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/getting-data-from-pdfs-the-easy-way-with-r\\\/#primaryimage\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/getting-data-from-pdfs-the-easy-way-with-r\\\/#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2023\\\/08\\\/r-programming-coffee-laptop-desk.jpg\",\n\t            \"datePublished\": \"2023-08-04T14:00:00+00:00\",\n\t            \"dateModified\": \"2023-08-04T14:00:08+00:00\",\n\t            \"description\": \"After you have\u00a0tabulizer\u00a0installed, we\u2019ll load it, and define a variable referencing an example PDF.\",\n\t            \"inLanguage\": \"en-US\",\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"ReadAction\",\n\t                    \"target\": [\n\t                        \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/getting-data-from-pdfs-the-easy-way-with-r\\\/\"\n\t                    ]\n\t                }\n\t            ]\n\t        },\n\t        {\n\t            \"@type\": \"ImageObject\",\n\t            \"inLanguage\": \"en-US\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/getting-data-from-pdfs-the-easy-way-with-r\\\/#primaryimage\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2023\\\/08\\\/r-programming-coffee-laptop-desk.jpg\",\n\t            \"contentUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2023\\\/08\\\/r-programming-coffee-laptop-desk.jpg\",\n\t            \"width\": 1000,\n\t            \"height\": 563,\n\t            \"caption\": \"Quant\"\n\t        },\n\t        {\n\t            \"@type\": \"WebSite\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#website\",\n\t            \"url\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/\",\n\t            \"name\": \"IBKR Campus US\",\n\t            \"description\": \"Financial Education from Interactive Brokers\",\n\t            \"publisher\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\"\n\t            },\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"SearchAction\",\n\t                    \"target\": {\n\t                        \"@type\": \"EntryPoint\",\n\t                        \"urlTemplate\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/?s={search_term_string}\"\n\t                    },\n\t                    \"query-input\": {\n\t                        \"@type\": \"PropertyValueSpecification\",\n\t                        \"valueRequired\": true,\n\t                        \"valueName\": \"search_term_string\"\n\t                    }\n\t                }\n\t            ],\n\t            \"inLanguage\": \"en-US\"\n\t        },\n\t        {\n\t            \"@type\": \"Organization\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\",\n\t            \"name\": \"Interactive Brokers\",\n\t            \"alternateName\": \"IBKR\",\n\t            \"url\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/\",\n\t            \"logo\": {\n\t                \"@type\": \"ImageObject\",\n\t                \"inLanguage\": \"en-US\",\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/logo\\\/image\\\/\",\n\t                \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/05\\\/ibkr-campus-logo.jpg\",\n\t                \"contentUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/05\\\/ibkr-campus-logo.jpg\",\n\t                \"width\": 669,\n\t                \"height\": 669,\n\t                \"caption\": \"Interactive Brokers\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/logo\\\/image\\\/\"\n\t            },\n\t            \"publishingPrinciples\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/about-ibkr-campus\\\/\",\n\t            \"ethicsPolicy\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/cyber-security-notice\\\/\"\n\t        },\n\t        {\n\t            \"@type\": \"Person\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/person\\\/d4018570a16fb867f1c08412fc9c64bc\",\n\t            \"name\": \"Andrew Treadway\",\n\t            \"description\": \"Andrew Treadway currently works as a Senior Data Scientist, and has experience doing analytics, software automation, and ETL. He completed a master\u2019s degree in computer science \\\/ machine learning, and an undergraduate degree in pure mathematics. Connect with him on LinkedIn: https:\\\/\\\/www.linkedin.com\\\/in\\\/andrew-treadway-a3b19b103\\\/In addition to TheAutomatic.net blog, he also teaches in-person courses on Python and R through my NYC meetup: more details.\",\n\t            \"sameAs\": [\n\t                \"https:\\\/\\\/theautomatic.net\\\/about-me\\\/\"\n\t            ],\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/author\\\/andrewtreadway\\\/\"\n\t        }\n\t    ]\n\t}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Getting Data from PDFs the Easy Way with R | IBKR Quant","description":"After you have\u00a0tabulizer\u00a0installed, we\u2019ll load it, and define a variable referencing an example PDF.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.interactivebrokers.com\/campus\/wp-json\/wp\/v2\/posts\/194377\/","og_locale":"en_US","og_type":"article","og_title":"Getting Data from PDFs the Easy Way with R | IBKR Campus US","og_description":"After you have\u00a0tabulizer\u00a0installed, we\u2019ll load it, and define a variable referencing an example PDF.","og_url":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/getting-data-from-pdfs-the-easy-way-with-r\/","og_site_name":"IBKR Campus US","article_published_time":"2023-08-04T14:00:00+00:00","article_modified_time":"2023-08-04T14:00:08+00:00","og_image":[{"width":1000,"height":563,"url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/08\/r-programming-coffee-laptop-desk.jpg","type":"image\/jpeg"}],"author":"Andrew Treadway","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Andrew Treadway","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/getting-data-from-pdfs-the-easy-way-with-r\/#article","isPartOf":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/getting-data-from-pdfs-the-easy-way-with-r\/"},"author":{"name":"Andrew Treadway","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/person\/d4018570a16fb867f1c08412fc9c64bc"},"headline":"Getting Data from PDFs the Easy Way with R","datePublished":"2023-08-04T14:00:00+00:00","dateModified":"2023-08-04T14:00:08+00:00","mainEntityOfPage":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/getting-data-from-pdfs-the-easy-way-with-r\/"},"wordCount":435,"commentCount":0,"publisher":{"@id":"https:\/\/ibkrcampus.com\/campus\/#organization"},"image":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/getting-data-from-pdfs-the-easy-way-with-r\/#primaryimage"},"thumbnailUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/08\/r-programming-coffee-laptop-desk.jpg","keywords":["Data Science","R","rstats","tabulizer package"],"articleSection":["Programming Languages","Quant","Quant Development","R Development"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/getting-data-from-pdfs-the-easy-way-with-r\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/getting-data-from-pdfs-the-easy-way-with-r\/","url":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/getting-data-from-pdfs-the-easy-way-with-r\/","name":"Getting Data from PDFs the Easy Way with R | IBKR Campus US","isPartOf":{"@id":"https:\/\/ibkrcampus.com\/campus\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/getting-data-from-pdfs-the-easy-way-with-r\/#primaryimage"},"image":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/getting-data-from-pdfs-the-easy-way-with-r\/#primaryimage"},"thumbnailUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/08\/r-programming-coffee-laptop-desk.jpg","datePublished":"2023-08-04T14:00:00+00:00","dateModified":"2023-08-04T14:00:08+00:00","description":"After you have\u00a0tabulizer\u00a0installed, we\u2019ll load it, and define a variable referencing an example PDF.","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/getting-data-from-pdfs-the-easy-way-with-r\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/getting-data-from-pdfs-the-easy-way-with-r\/#primaryimage","url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/08\/r-programming-coffee-laptop-desk.jpg","contentUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/08\/r-programming-coffee-laptop-desk.jpg","width":1000,"height":563,"caption":"Quant"},{"@type":"WebSite","@id":"https:\/\/ibkrcampus.com\/campus\/#website","url":"https:\/\/ibkrcampus.com\/campus\/","name":"IBKR Campus US","description":"Financial Education from Interactive Brokers","publisher":{"@id":"https:\/\/ibkrcampus.com\/campus\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ibkrcampus.com\/campus\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/ibkrcampus.com\/campus\/#organization","name":"Interactive Brokers","alternateName":"IBKR","url":"https:\/\/ibkrcampus.com\/campus\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/logo\/image\/","url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2024\/05\/ibkr-campus-logo.jpg","contentUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2024\/05\/ibkr-campus-logo.jpg","width":669,"height":669,"caption":"Interactive Brokers"},"image":{"@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/logo\/image\/"},"publishingPrinciples":"https:\/\/www.interactivebrokers.com\/campus\/about-ibkr-campus\/","ethicsPolicy":"https:\/\/www.interactivebrokers.com\/campus\/cyber-security-notice\/"},{"@type":"Person","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/person\/d4018570a16fb867f1c08412fc9c64bc","name":"Andrew Treadway","description":"Andrew Treadway currently works as a Senior Data Scientist, and has experience doing analytics, software automation, and ETL. He completed a master\u2019s degree in computer science \/ machine learning, and an undergraduate degree in pure mathematics. Connect with him on LinkedIn: https:\/\/www.linkedin.com\/in\/andrew-treadway-a3b19b103\/In addition to TheAutomatic.net blog, he also teaches in-person courses on Python and R through my NYC meetup: more details.","sameAs":["https:\/\/theautomatic.net\/about-me\/"],"url":"https:\/\/www.interactivebrokers.com\/campus\/author\/andrewtreadway\/"}]}},"jetpack_featured_media_url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/08\/r-programming-coffee-laptop-desk.jpg","_links":{"self":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts\/194377","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/users\/388"}],"replies":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/comments?post=194377"}],"version-history":[{"count":0,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts\/194377\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/media\/194443"}],"wp:attachment":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/media?parent=194377"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/categories?post=194377"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/tags?post=194377"},{"taxonomy":"contributors-categories","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/contributors-categories?post=194377"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}