Setup a Remote NVIDIA AI Workbench Node using EC2
ToolingA new environment for GPU-enabled experimentation
NVIDIA's AI Workbench is a tool that allows developers, data science teams and in particular beginners to iterate quickly on experiments and ship successful ones.
Making use of GPU acceleration for Machine Learning workloads, whether training or inference, often helps achieve orders of magnitude improvement in speed, latency and generally developers' experiment iteration speed.
However, setting up CUDA-enabled GPU nodes has traditionally had a high barrier to entry, as GPU drivers, the underlying host architecture and additional virtualization layers needed to be aligned. AI Workbench removes the complexity that often derails beginners from deploying GPU-accelerated Machine Learning workloads. By shipping with straightforward installation scripts, AI Workbench allows you to go from fresh EC2 Instance to a fully-configured, remotely accessible playground in minutes.
Though NVIDIA's own documentation is very detailed on setting up a Remote installation of AI Workbench, today we'll provide a step-by-step tutorial on how to set up an EC2 Instance to be a Remote AI Workbench compute node.
AMIs, Compute Nodes and other Prerequisites
⚠️ Note: by continuing with this tutorial you will be experimenting with AWS services that go beyond the AWS Free Tier limits, and you will incur EC2 costs by continuing. Always plan your AWS costs ahead of time to avoid unexpected charges.
ℹ️ Before we begin listing the prerequisites for AI Workbench, developers should be aware that AWS often places limits on certain instance types, in order to prevent accidental deployment and runaway costs. Before you will be able to deploy a GPU-enabled EC2 instance, such as the
g4dn.xlarge
(NVIDIA Tesla T4 GPU), you should follow this AWS document which details the steps needed to issue a service quota increase for the AWS EC2 resource.
NVIDIA's own requirements for deploying a Remote instance of AI Workbench are:
- Ubuntu 22.04 operating system
- 16GB of RAM
- 500MB disk space for the NVIDIA AI Workbench application
- 30 to 40GB of disk space for containers
- Access: SSH access as a non-root user with
sudo
privileges via a public/private key pair- The private key can't be password-protected
Firstly, the operating system. AI Workbench requires at least Ubuntu 22.04. AWS conveniently provides ready-built AMIs (Amazon Machine Images) for different operating systems.
To find a compatible Ubuntu 22.04 AMI, navigate to AWS Console > EC2 > AMI Catalog > Community AMIs, where we will search for ubuntu 22.04
Great! We've found an AMI that is Ubuntu 22.04 running on x86_64 architecture. Let's make note of that AMI ID: ami-06c4be2792f419b7b
ℹ️ Note: not all AMIs are available in all AWS regions, before deciding on deploying to a particular region, ensure that a Ubuntu 22.04 AMI is available in that region
Now that we have our Operating System sorted, let's fulfill the other prerequisites.
For this we will look for EC2 instances that have NVIDIA GPUs. This AWS Article is a helpful guide offering a mapping between AWS Instance Type and NVIDIA GPU type.
⚠️ For running AI Workbench, an NVIDIA GPU is required, therefore AWS Gravitron2
g5g
instances or AWS InferentiaInf1
,Trn1
,Trn1n
andInf2
instances will not be compatible
For this tutorial, we will go with a cost-effective g4dn.xlarge
instance, which features an NVIDIA Tesla T4 GPU, and fulfills all the system requirements:
Requirement | AI Workbench | g4dn.xlarge specs |
---|---|---|
RAM | 16GB | 16GB |
Storage | 500MB + ~30-40GB | 125GB NVMe SSD |
Let's launch this instance on AWS:
Head to AWS Console > EC2 > Instances > Launch instances:
We will provide the following configuration:
- Name: NVIDIA AI Workbench 01
- AMI: (ID we identified above, in our case ami-06c4be2792f419b7b, selecting x86_64 for architecture)
- Instance Type: g4dn.xlarge
- Key Pair: Create new key pair >
- Key pair name: nvidia_ai_workbench_01
- Key pair type: RSA
- Private key file format: .pem (for use with OpenSSH)
ℹ️ Save this key for later
- Network settings: (keep your default settings)
- Allow SSH traffic from: Anywhere 0.0.0.0/0
- Storage: 1x 8GiB gp2 Root volume
Click Launch instance, after a few seconds your instance should be in Running state.
Accessing our Remote instance
Finally, we're ready to install AI Workbench on our GPU-enabled EC2 Instance that we've just deployed.
Using the SSH key we created and downloaded earlier, let's connect to our instance via SSH.
Click on your new instance and find its' Public IPv4 DNS:
We will have to restrict the permissions on the key that we've just created in AWS in order for our SSH client to allow us to login to the remote instance
Now let's login via SSH:
Great! We're in!
Installing AI Workbench Remote
NVIDIA provides a convenient install script that we can fetch from their repository.
And while NVIDIA's Documentation provides clear instructions on installing AI Workbench using the Text User Interface (TUI), we will demonstrate how AI Workbench can be installed via Command-Line Interface (CLI) flags.
ℹ️ Using the CLI is useful if installing AI Workbench is part of an IT Automation Pipeline such as Ansible or Terraform.
Downloading the nvwb-cli
tool is straight forward, by executing the following commands:
The following commands:
- create a destination folder for the
nvwb-cli
- Fetch the correct
nvwb-cli
for our operating system and architecture (here: Ubuntu 22.04 x86_64) - Allow the
nvwb-cli
to be executed
Now that we have the CLI downloaded, we can install AI Workbench in non-interactive mode by running one single command:
If you want to use the Text User Interface, please follow this tutorial here
You should see the following output, followed by a confirmation that the installation was successful:
The command above does the following:
- Configures
nvwb-cli install
to run in non-interactive mode - Installs AI Workbench using Docker for containerization
- Installs NVIDIA GPU Drivers
- Accepts the NVIDIA AI Workbench EULA
The installation might take up to 15 minutes since we're installing the NVIDIA GPU Drivers as well.
After the installation has finished, let's reboot the instance:
You will lose connectivity to the instance, this is expected.
Configuring the new Remote in Local AI Workbench
We're done with configuring the Remote AI Workbench instance, now let's test the connectivity.
Open up the NVIDIA AI Workbench Desktop Application on your local machine. Click the Add Remote Location button, and enter the following details:
If you haven't installed AI Workbench locally yet, follow this tutorial and come back here when done -> NVIDIA - AI Workbench Docs - Installation
- Location Name: your choice, here we'll use EC2 Remote AI Workbench
- Description: your choice, here we'll use g4dn.xlarge instance with NVIDIA AI Workbench installed
- Hostname or IP Address: use the same hostname you used earlier to connect via SSH, in our case: ec2-....compute.amazonaws.com
- SSH Port: 22
- SSH Username: ubuntu (AWS Default)
- SSH Key File: the same key file used above nvidia_ai_workbench_key_01.pem
- Location of the
.nvwb
directory on the Remote system: /home/ubuntu/.nvwb
Select Add Location. You should now be able to see the remote Launchpad instance show up as a location in your AI Workbench.
And that's it! We now have NVIDIA AI Workbench running on a GPU-enabled EC2 instance, and accessible from our local machine.
In the next iteration of our NVIDIA AI Workbench series, we will be looking at setting up development environments and using Jupyter Notebooks to creata a Toy Retrieval-Augmented Generation (RAG) demo.
Thank you for following! If this post has been helpful, consider sharing it with your peers!