Aurelio logo
Updated on Apr 1, 2024

Setup a Remote NVIDIA AI Workbench Node using EC2

Tooling

A new environment for GPU-enabled experimentation

NVIDIA's AI Workbench is a tool that allows developers, data science teams and in particular beginners to iterate quickly on experiments and ship successful ones.

Making use of GPU acceleration for Machine Learning workloads, whether training or inference, often helps achieve orders of magnitude improvement in speed, latency and generally developers' experiment iteration speed.

However, setting up CUDA-enabled GPU nodes has traditionally had a high barrier to entry, as GPU drivers, the underlying host architecture and additional virtualization layers needed to be aligned. AI Workbench removes the complexity that often derails beginners from deploying GPU-accelerated Machine Learning workloads. By shipping with straightforward installation scripts, AI Workbench allows you to go from fresh EC2 Instance to a fully-configured, remotely accessible playground in minutes.

Though NVIDIA's own documentation is very detailed on setting up a Remote installation of AI Workbench, today we'll provide a step-by-step tutorial on how to set up an EC2 Instance to be a Remote AI Workbench compute node.

AMIs, Compute Nodes and other Prerequisites

⚠️ Note: by continuing with this tutorial you will be experimenting with AWS services that go beyond the AWS Free Tier limits, and you will incur EC2 costs by continuing. Always plan your AWS costs ahead of time to avoid unexpected charges.

ℹ️ Before we begin listing the prerequisites for AI Workbench, developers should be aware that AWS often places limits on certain instance types, in order to prevent accidental deployment and runaway costs. Before you will be able to deploy a GPU-enabled EC2 instance, such as the g4dn.xlarge (NVIDIA Tesla T4 GPU), you should follow this AWS document which details the steps needed to issue a service quota increase for the AWS EC2 resource.

NVIDIA's own requirements for deploying a Remote instance of AI Workbench are:

  • Ubuntu 22.04 operating system
  • 16GB of RAM
  • 500MB disk space for the NVIDIA AI Workbench application
  • 30 to 40GB of disk space for containers
  • Access: SSH access as a non-root user with sudo privileges via a public/private key pair
    • The private key can't be password-protected

Firstly, the operating system. AI Workbench requires at least Ubuntu 22.04. AWS conveniently provides ready-built AMIs (Amazon Machine Images) for different operating systems.

To find a compatible Ubuntu 22.04 AMI, navigate to AWS Console > EC2 > AMI Catalog > Community AMIs, where we will search for ubuntu 22.04

AWS AMI Catalog Console

Great! We've found an AMI that is Ubuntu 22.04 running on x86_64 architecture. Let's make note of that AMI ID: ami-06c4be2792f419b7b

ℹ️ Note: not all AMIs are available in all AWS regions, before deciding on deploying to a particular region, ensure that a Ubuntu 22.04 AMI is available in that region

Now that we have our Operating System sorted, let's fulfill the other prerequisites.

For this we will look for EC2 instances that have NVIDIA GPUs. This AWS Article is a helpful guide offering a mapping between AWS Instance Type and NVIDIA GPU type.

⚠️ For running AI Workbench, an NVIDIA GPU is required, therefore AWS Gravitron2 g5g instances or AWS Inferentia Inf1, Trn1, Trn1n and Inf2 instances will not be compatible

For this tutorial, we will go with a cost-effective g4dn.xlarge instance, which features an NVIDIA Tesla T4 GPU, and fulfills all the system requirements:

RequirementAI Workbenchg4dn.xlarge specs
RAM16GB16GB
Storage500MB + ~30-40GB125GB NVMe SSD

Let's launch this instance on AWS:

Head to AWS Console > EC2 > Instances > Launch instances:

We will provide the following configuration:

  • Name: NVIDIA AI Workbench 01
  • AMI: (ID we identified above, in our case ami-06c4be2792f419b7b, selecting x86_64 for architecture)
  • Instance Type: g4dn.xlarge
  • Key Pair: Create new key pair >
    • Key pair name: nvidia_ai_workbench_01
    • Key pair type: RSA
    • Private key file format: .pem (for use with OpenSSH)

ℹ️ Save this key for later

  • Network settings: (keep your default settings)
    • Allow SSH traffic from: Anywhere 0.0.0.0/0
  • Storage: 1x 8GiB gp2 Root volume

Click Launch instance, after a few seconds your instance should be in Running state.

Accessing our Remote instance

Finally, we're ready to install AI Workbench on our GPU-enabled EC2 Instance that we've just deployed.

Using the SSH key we created and downloaded earlier, let's connect to our instance via SSH.

Click on your new instance and find its' Public IPv4 DNS:

EC2 Console

We will have to restrict the permissions on the key that we've just created in AWS in order for our SSH client to allow us to login to the remote instance

bash
chmod 700 nvidia_ai_workbench_key_01.pem

Now let's login via SSH:

bash
ssh -i nvidia_ai_workbench_key_01.pem ubuntu@ec2-....compute.amazonaws.com (address that we identified above)
text
The authenticity of host 'ec2-....compute.amazonaws.com (18.x.x.x)' can't be established.
ED25519 key fingerprint is SHA256:ca+....
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes

Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 6.5.0-1014-aws x86_64)

...

ubuntu@ip-172-x-x-x:~$

Great! We're in!

Installing AI Workbench Remote

NVIDIA provides a convenient install script that we can fetch from their repository.

And while NVIDIA's Documentation provides clear instructions on installing AI Workbench using the Text User Interface (TUI), we will demonstrate how AI Workbench can be installed via Command-Line Interface (CLI) flags.

ℹ️ Using the CLI is useful if installing AI Workbench is part of an IT Automation Pipeline such as Ansible or Terraform.

Downloading the nvwb-cli tool is straight forward, by executing the following commands:

bash
mkdir -p $HOME/.nvwb/bin && \
curl -L https://workbench.download.nvidia.com/stable/workbench-cli/$(curl -L -s https://workbench.download.nvidia.com/stable/workbench-cli/LATEST)/nvwb-cli-$(uname)-$(uname -m) --output $HOME/.nvwb/bin/nvwb-cli && \
chmod +x $HOME/.nvwb/bin/nvwb-cli

The following commands:

  • create a destination folder for the nvwb-cli
  • Fetch the correct nvwb-cli for our operating system and architecture (here: Ubuntu 22.04 x86_64)
  • Allow the nvwb-cli to be executed

Now that we have the CLI downloaded, we can install AI Workbench in non-interactive mode by running one single command:

If you want to use the Text User Interface, please follow this tutorial here

bash
sudo -E $HOME/.nvwb/bin/nvwb-cli install \
--noninteractive \
--docker \
--drivers \
--accept

You should see the following output, followed by a confirmation that the installation was successful:

text
Starting installation. Please wait.

Installation complete.

The command above does the following:

  • Configures nvwb-cli install to run in non-interactive mode
  • Installs AI Workbench using Docker for containerization
  • Installs NVIDIA GPU Drivers
  • Accepts the NVIDIA AI Workbench EULA

The installation might take up to 15 minutes since we're installing the NVIDIA GPU Drivers as well.

After the installation has finished, let's reboot the instance:

bash
sudo reboot

You will lose connectivity to the instance, this is expected.

Configuring the new Remote in Local AI Workbench

We're done with configuring the Remote AI Workbench instance, now let's test the connectivity.

Open up the NVIDIA AI Workbench Desktop Application on your local machine. Click the Add Remote Location button, and enter the following details:

If you haven't installed AI Workbench locally yet, follow this tutorial and come back here when done -> NVIDIA - AI Workbench Docs - Installation

  • Location Name: your choice, here we'll use EC2 Remote AI Workbench
  • Description: your choice, here we'll use g4dn.xlarge instance with NVIDIA AI Workbench installed
  • Hostname or IP Address: use the same hostname you used earlier to connect via SSH, in our case: ec2-....compute.amazonaws.com
  • SSH Port: 22
  • SSH Username: ubuntu (AWS Default)
  • SSH Key File: the same key file used above nvidia_ai_workbench_key_01.pem
  • Location of the .nvwb directory on the Remote system: /home/ubuntu/.nvwb

Select Add Location. You should now be able to see the remote Launchpad instance show up as a location in your AI Workbench.

And that's it! We now have NVIDIA AI Workbench running on a GPU-enabled EC2 instance, and accessible from our local machine.

In the next iteration of our NVIDIA AI Workbench series, we will be looking at setting up development environments and using Jupyter Notebooks to creata a Toy Retrieval-Augmented Generation (RAG) demo.

Thank you for following! If this post has been helpful, consider sharing it with your peers!