The article “Trading using GPU-based RAPIDS Libraries from Nvidia” first appeared on QuantInsti blog.
Don't be deceived by the past. In the rapidly evolving domains of data science and financial machine learning, quicker calculations and more effective processing techniques are becoming more and more important. These days, a new set of open-source software libraries called RAPIDS is gaining popularity.
RAPIDS leverages GPU capabilities to expedite data science tasks. This post will look at every aspect of RAPIDS, including its libraries, hardware specifications, setup guidelines, useful applications, and drawbacks. Last but not least, as usual, I'm going to offer a trading strategy based on the RAPIDS suite!
We cover:
- Understanding RAPIDS Libraries
- RAPIDS Libraries Installation Guide
- Practical Examples of the RAPIDS Libraries
- A trading strategy using machine learning and the GPU
- Limitations of the Up-to-Date Libraries
Understanding RAPIDS Libraries
A new approach to speeding up data science and machine learning procedures is provided by the open-source software libraries collectively known as RAPIDS. It is necessary to use all RAPIDS libraries to fully take advantage of the computational and data analysis capabilities of GPUs.
Let’s look at the main RAPIDS Librarieshere:
- cuDF: A GPU-accelerated data frame manipulation and operation tool similar to Pandas but optimised for GPUs. It has a Pandas-like user interface and accelerates processing through GPU parallelism.
- cuML: This library is used for machine learning tasks. It provides GPU-accelerated algorithms for various tasks, such as clustering, regression, and classification. These algorithms are made to improve performance without compromising accuracy, which makes them suitable for use with large-scale datasets.
- cuPy: Identical in appearance to NumPy, cuPy is intended to be a GPU-accelerated array library that enables fast GPU array operations. It mimics NumPy's functionality to seamlessly transfer array-based code to GPU architectures, increasing computational speed.
These libraries are combined to create a single system that helps with data manipulation, analysis, and machine learning tasks by utilizing the parallel processing power of GPUs. This acceleration makes it possible to develop models and analyze data more quickly, which is helpful for tasks involving big datasets. It shortens processing times as well.
To make the most of GPU-accelerated computing, researchers, machine learning experts, and data scientists must grasp the nuances of the RAPIDS libraries. These libraries provide high-performance computing capabilities along with the ability to speed up and simplify a multitude of data processing tasks.
RAPIDS Libraries Installation Guide
The RAPIDS libraries can be installed using the following steps:
Step 1: System requirements
Please confirm that your system satisfies the requirements before proceeding with the installation. It is imperative to have a compatible GPU because RAPIDS libraries are optimized for NVIDIA GPUs. It only works in Linux-based operating systems. In case you have Windows, you can use WSL2 to have Ubuntu as a virtual machine. Verify that the Linux version on your machine is supported (such as Ubuntu or CentOS). Installing NVIDIA drivers that are compatible with your GPU is also required.
Step 2: Installing Conda
The installation and management of RAPIDS libraries require the use of Conda, a package manager and environment manager. Installing Miniconda or Anaconda, two Python distribution platforms that support Conda, should be your first step.
Follow the installation guidelines on the official website to download and install Miniconda or Anaconda.
For RAPIDS, create a new Conda environment to keep the setup tidy and isolated. The following command can be used to create an environment with the name “rapids” or any other desired name:
conda create -n rapids python=3.10
create_environment.sh hosted with ❤ by GitHub
Step 3: Install the RAPIDS Libraries
Use the following command to activate the Conda environment after it has been created:
conda activate rapids
activate_environment.sh hosted with ❤ by GitHub
Next, use the following command to install RAPIDS libraries:
conda install -c rapidsai -c nvidia -c conda-forge -c defaults rapids=0.21 python=3.10
install_rapids.sh hosted with ❤ by GitHub
This command will install the RAPIDS suite in the specified Conda environment. The rapids=0.21 refers to the version of RAPIDS being installed.
Step 4: Verifying the Installation
Once the installation process is complete, you can verify that RAPIDS libraries have been successfully installed in your Conda environment. Open a Python interpreter within the Conda environment and import the desired libraries (e.g., cuDF, cuML, cuPy) to ensure they are accessible and functioning properly.
import cudf import cuml import cupy
checking_libraries_installation.py hosted with ❤ by GitHub
If the import statements execute without errors, it indicates the successful installation of RAPIDS libraries.
Practical Examples of the RAPIDS Libraries
Let’s understand how to use the 3 libraries from above. The examples will give a glimpse of what you can do with these libraries. As you’ll discover, they act very similar to numpy, pandas and scikit-learn. So you will not get confused at all while using them. They’re easy to handle and you’ll start coding quickly.
Ready to have some fun?
Let's explore!
cuPy Examples
We now create two random arrays with 10,000 observations. Then we multiply them.
Example 1: In this example, we create 10,000 random numbers and dot-multiply them to get a unique value as the result.
# Import the cupy library import cupy as cp # Create 10,000 random numbers x = cp.random.rand(10000) y = cp.random.rand(10000) # Perform the multiplication of both arrays result = cp.dot(x, y) # Print the result print(result)
cupy_example_01.py hosted with ❤ by GitHub
Example 2: Here we create two 2×2 matrices and compute the multiplication of both. We then print the resulting matrix.
# Import the corresponding library import cupy as cp # Define matrices using CuPy arrays matrix_a = cp.array([[1, 2], [3, 4]]) matrix_b = cp.array([[5, 6], [7, 8]]) # Perform a matrix multiplication using CuPy result = cp.matmul(matrix_a, matrix_b) print("Result of Matrix Multiplication:") print(result)
cupy_example_02.py hosted with ❤ by GitHub
cuDF Examples
Example 1: Next, we create a GPU-based dataframe with 2 columns A and B and 3 observations each and sum both columns and the result we save it in column C. So simple, right?
# Import the corresponding library import cudf # Create a GPU DataFrame data = {'A': [1, 2, 3], 'B': [4, 5, 6]} df = cudf.DataFrame(data) # Perform data manipulation df['C'] = df['A'] + df['B'] print(df)
cudf_example_02.py hosted with ❤ by GitHub
Example 2: Here we create a pandas dataframe obtained with a dictionary. Then we upload the pandas-based dataframe to the GPU memory using the cudf library. Then we print the dataframe.
# Import the libraries import pandas as pd import cudf # Creating a Pandas DataFrame pandas_data = { 'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50], 'C': ['a', 'b', 'c', 'd', 'e'] } pandas_df = pd.DataFrame(pandas_data) # Display the Pandas DataFrame print("Pandas DataFrame:") print(pandas_df) # Convert Pandas DataFrame to cuDF DataFrame cudf_df = cudf.DataFrame.from_pandas(pandas_df) # Display the cuDF DataFrame print("cuDF DataFrame:") print(cudf_df)
cudf_example_02.py hosted with ❤ by GitHub
cuML Examples
Example 1: We provide in this example two cupy arrays with 1000 random numbers each and use them to fit a k-means clustering algorithm with the cuml library. We then predict the labels of the features as per the model.
# Import the libraries from cuml.cluster import KMeans import cudf # Generate sample data data = cudf.DataFrame() data['feature1'] = cp.random.rand(1000) data['feature2'] = cp.random.rand(1000) # Perform KMeans clustering kmeans = KMeans(n_clusters=3) kmeans.fit(data) labels = kmeans.predict(data) print(labels)
cuml_example_01.py hosted with ❤ by GitHub
Example 2: Finally, in this example, we create random input and prediction features using the cuml library. Then, we split the data into train and test data and next perform a random forest classifier to the data. Finally we predict the X test data and show only 10 predictions.
# Import the libraries import pandas as pd import cudf import cuml from cuml.datasets import make_classification from cuml.model_selection import train_test_split # Generating a sample Pandas DataFrame X, y = make_classification(n_samples=1000, n_features=4, n_classes=2, random_state=42) # Splitting the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Creating and fitting a Random Forest Classifier using cuML rf_classifier = cuml.ensemble.RandomForestClassifier(n_estimators=100) rf_classifier.fit(X_train, y_train) # Predicting on the test set y_pred = rf_classifier.predict(X_test) # Displaying sample predictions print("Sample predictions:") print(y_pred[:10])
cuml_example_02.py hosted with ❤ by GitHub
Did you notice?
It’s like using CPU-based libraries! So smooth the coding, right?
Stay tuned for the second part to learn about trading strategy using machine learning and the GPU.
Disclosure: Interactive Brokers
Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from QuantInsti and is being posted with its permission. The views expressed in this material are solely those of the author and/or QuantInsti and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.
Join The Conversation
If you have a general question, it may already be covered in our FAQs. If you have an account-specific question or concern, please reach out to Client Services.