The post “A Step by Step Guide to Write an R package That Uses C++ Code (Ubuntu)” was originally posted on Pacha.dev Blog.
Motivation
A large part of my research interest requires to estimate computationally intensive models, such as the General Equilibrium Poisson Pseudo Maximum Likelihood (GEPPML) estimator derived from the equilibrium conditions introduced by Anderson and Van Wincoop (2004) for estimation and inference.
The GEPPML estimator is a computationally intensive estimator that requires to solve a system of non-linear equations, and for this task we might be better-off by using a compiled language such as C++. The good news is that we can use C++ code within R and Python, and this blog post is about using C++ functions from R.
Also, I do not pretend to be an expert on C++ or debate if R is better than Python. I use both from Visual Studio Code. I do want to share my experience on how to use C++ code within R.
Honest disclaimer
This blog post is a summary of what worked after hours of fails for my future self. I hope it helps you too.
I am a Statistician and Political Scientist, not a Computer Scientist!
Setup
Because I have been already learning C++ version 11, I decided to install llvm-11
on my laptop that has Linux Mint installed, and which is based on Ubuntu 22.04.
Ubuntu and its derived distributions use gcc
as the default C++ compiler, and clang
is not installed by default. Different resources mention that clang
provides more informative error messages when the compilation fails and when we debug code.
Counting on informative error messages is highly useful resource when we are learning C++ or when our code is failing in two different ways, being one that it does not compile, and the other that it compiles but then when we call a function from RStudio (or VSCode) it crashes the R session.
I installed the R packages cpp11
and usethis
.
install.packages(c("cpp11", "usethis"))
I created a file ~/.Rprofile
containing the following lines.
library(devtools) library(usethis) library(cpp11)
I run nano ~/.R/Makevars
from bash and then saved with CTRL+O+ENTER and closed it with CTRL+X. It is the same as creating it with the text editor from Gnome or any other desktop environment.
Now forget about devtools::install()
. After reopening your editor, every time you use RStudio (or VSCode) you just call install()
, and the same applies to usethis::use_*()
and cpp11::cpp_*()
functions.
To install llvm-11
I downloaded the installation script from the official LLVM repository, and it also installed clang-11
.
cd Downloads wget https://apt.llvm.org/llvm.sh chmod +x llvm.sh sudo ./llvm.sh 11
Up to this point I still had the following error messages when compiling C++ code.
fatal error: 'cstdio' file not found fatal error: 'vector' file not found cannot find -lc++abi: No such file or directory
I had to install additional packages. This took me a few hours searching on the Internet until I figured it out.
sudo apt install g++-11 libc++-11-dev libc++abi-11-dev
To be sure that the install()
function in R uses the correct version of clang++
I created the ~/.R/Makevars
file. The contents of the file are the following.
CLANGVER=-11 CLANGLIB=-stdlib=libc++ CXX=$(CCACHE) clang++$(CLANGVER) $(CLANGLIB) CXX11=$(CCACHE) clang++$(CLANGVER) $(CLANGLIB) CC=$(CCACHE) clang$(CLANGVER) SHLIB_CXXLD=clang++$(CLANGVER) $(CLANGLIB) CXXFLAGS=-Wall -O0 -pedantic CXX11FLAGS=-Wall -O0 -pedantic
For both CXXFLAGS
and CXX11FLAGS
I am using -O0
to avoid optimization, which is useful for debugging. After the code is working, I can change it to -O3
to optimize the compiled code.
If later on I need to compile with gcc
, I can open ~/.R/Makevars
, comment all the lines, restart RStudio or VSCode, and run install()
again.
If you close RStudio (or VSCode) and open it again, you can check that the changes were implemented by running pkgbuild::check_build_tools(debug = TRUE)
, which should return the following output.
Trying to compile a simple C file Running /usr/lib/R/bin/R CMD SHLIB foo.c using C compiler: ‘Ubuntu clang version 11.1.0-6’ clang-11 -I"/usr/share/R/include" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-JhpCKt/r-base-4.3.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c foo.c -o foo.o clang-11 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -o foo.so foo.o -L/usr/lib/R/lib -lR
If I were using gcc, the output would have been as in the following lines.
Trying to compile a simple C file Running /usr/lib/R/bin/R CMD SHLIB foo.c using C compiler: ‘gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0’ gcc -I"/usr/share/R/include" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-JhpCKt/r-base-4.3.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c foo.c -o foo.o gcc -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -o foo.so foo.o -L/usr/lib/R/lib -lR
The key here is that when I use clang
the lines start with clang
, not with gcc
.
Creating a dummy package
From RStudio (or VSCode) we can create a new package by running create_package("~/cpp11dummypackage")
. This will create a new folder with the name cpp11dummypackage
. Then I run use_cpp11()
to add the required files to use C++ code within R.
Then I run use_r("cpp11dummypackage-package")
to create a new R script file with the name cpp11dummypackage-package.R
within the R
folder, and added the following code to it.
#' @useDynLib cpp11dummypackage, .registration = TRUE NULL
The usethis
skeleton also created the file src/code.cpp
for us. I added a simple function to transpose a matrix to it, by replacing the file contents by the following lines.
#include <cpp11.hpp> using namespace cpp11; [[cpp11::register]] doubles_matrix<> transpose_(doubles_matrix<> X) { int NX = X.nrow(); int MX = X.ncol(); writable::doubles_matrix<> R(MX, NX); for (int i = 0; i < MX; i++) { for (int j = 0; j < NX; j++) { R(i, j) = X(j, i); } } return R; }
In order to export the function, I added the following lines to cpp11dummypackage-package.R
.
#' Transpose a matrix #' @export #' @param X numeric matrix #' @return numeric matrix #' @examples #' set.seed(1234) #' X <- matrix(rnorm(4), nrow = 2, ncol = 2) #' X #' transpose(X) transpose <- function(X) { transpose_(X) }
I tested the functions after running cpp11_register()
and load_all()
.
> set.seed(1234) > X <- matrix(rnorm(4), nrow = 2, ncol = 2) > X [,1] [,2] [1,] -1.2070657 1.084441 [2,] 0.2774292 -2.345698 > transpose(X) [,1] [,2] [1,] -1.207066 0.2774292 [2,] 1.084441 -2.3456977
If I would have passed 1:4
instead of rnorm(4)
to matrix()
, I would have obtained the following error message.
> transpose(X) Error: Invalid input type, expected 'double' actual 'integer'
This is because I declared the function to accept a doubles_matrix<>
as input, and not an integers_matrix<>
.
To install the recently created package, I run the following lines in the R console.
clean_dll() cpp_register() document() install()
Debugging the package
In order to access debugging symbols, I created a new Makevars
file within the src
folder, and added the following lines.
CXX_STD = CXX11 PKG_CPPFLAGS = -UDEBUG -g
Then I reinstalled the package compiled with debugging symbols, and in bash I run R -d lldb-11
. From there I could follow this excellent guide to debug R and C++ code.
I shouldn’t generally leave the -g
flag on in a Makevars file, that will insert trace symbols in the compiled binary, both increasing compilation times (often by a large margin), and creating larger binaries. Once the package is compiled and I am sure that it works properly, I need to remove the PKG_CPPFLAGS = -UDEBUG -g
line.
A more complex example
I created a package containing a set of simple functions to obtain the Ordinary Least Squares (OLS) estimator by calling a C++ function that calls other C++ functions. My approach was to create one function per step, which meant to create one function to obtain XtX, another for (XtX)−1 which consisted in implementing the Gauss-Jordan method to invert a matrix, another for XtY and then call each of those functions to obtain .
This implementation is extremely naive, but it is enough to show how to use C++ code within R. Please see it from my GitHub profile.
A good challenge would be to implement the QR decomposition used by the lm()
function in R and use it to obtain the OLS estimator in C++. This would require some effort, but here you can find a good starting point.
In any case, it would be extremely hard to beat the performance of the lm()
function in R, which has some internals written in C, and how computationally robust lm()
is means another feature that is hard to beat.
References
- Debugging in R with a single command
- Debugging an R package with C++
- Clang++ missing C++ header?
- How to I tell RStudio not to ignore the indication to use clang in Makevars?
- R’s Makevars: PKG_CXXFLAGS vs. PKG_CXX11FLAGS
- Debugging memory errors with valgrind and gdb
- A Deep Dive Into How R Fits a Linear Model
Disclosure: Interactive Brokers
Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from Pacha.dev Blog and is being posted with its permission. The views expressed in this material are solely those of the author and/or Pacha.dev Blog and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.
Join The Conversation
If you have a general question, it may already be covered in our FAQs. If you have an account-specific question or concern, please reach out to Client Services.