NVIDIA CUDA Install on Ubuntu 18.04 / 20.04
CUDA processing has become an integral part of data-intensive processing. Installation is still cumbersome and changes when new versions become available.
There are several resources available on the net that guide you through the installation procedure - all tailored at different purposes. First and foremost, you should look at the official NVIDIA Documentation.
The following installation procedure is tailored at Ubuntu 18.04 and 20.04.
I advise to manually install the CUDA driver - especially if you are on a compute node and don’t need the CUDA display drivers. Otherwise the system may become unstable with an automatic driver update and may result in nvcc incompatibility.
General steps to install CUDA Drivers and Toolkits on Ubuntu Systems
It is best to install the CUDA 10.2/11.0 drivers directly from the NVIDIA webpage. On Ubuntu 18.04 we use CUDA 10.2, because 18.04 still runs with GCC v8 and on Ubuntu 20.04 we use CUDA 11.0 (GCC v9). By updating your GCC environment, you can also run 11.0 on Ubuntu 18.04. We have not noticed any differences between the versions.
- Deinstall all NDVIDIA repository drivers.
sudo apt-get purge nvidia-*
. If (for some reason) you want to keep the drivers, usesudo apt-get purge nvidia-cuda*
. - Download CUDA drivers from the NVIDIA webpage. For Ubuntu 18.04 use Download 10.2 or
wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
and the Patch 10.2 Aug-26 2020 withwget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/patches/1/cuda_10.2.1_linux.run
. For Ubuntu 20.04 use Download 11.0 orwget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda_11.0.2_450.51.05_linux.run
- Before Installation If you are using the NVIDIA card also for the X Server, you have to turn off the xserver. You can simply switch to the console with Ctrl-Alt-F2, logging in and turning off the xserver with
sudo init 3
or in some cases you can usesudo service lightdm stop
. For compute-only servers, you should not be running an X Server. - Install the driver with (here for 10.2. Make sure you are root (
sudo -i
):sh ./cuda_10.2.89_440.33.01_linux.run
. Apply patchsh ./cuda_10.2.1_linux.run
. - You want to add the binary directory to you PATH variable:
export PATH=$PATH:/usr/local/cuda-10.2/bin
. - Make sure that
/usr/local/cuda-10.2/lib64
is either in LD_LIBRARY PATH withexport LD_LIBRARY PATH=$LD_LIBRARY PATH:/usr/local/cuda-10.2/lib64
or add /usr/local/cuda-10.2/lib64 to /etc/ld.so.conf:echo /usr/local/cuda-10.2/lib64 > /etc/ld.so.conf.d/cuda.conf
and runldconfig
- Reboot (
sudo reboot
). You should now run a fairly recent NDVIDIA driver (440.33) and also should have all nvidia tools installed. - Verify by running
nvcc --version
andnvidia-smi
.
With nvidia-smi
you should see something along these lines (system aconcagua):
Sun Oct 18 07:36:33 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:3B:00.0 Off | 0 |
| N/A 35C P0 34W / 250W | 0MiB / 32510MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... Off | 00000000:AF:00.0 Off | 0 |
| N/A 35C P0 36W / 250W | 0MiB / 32510MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
or on system kailash:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P40 Off | 00000000:02:00.0 Off | 0 |
| N/A 26C P0 51W / 250W | 0MiB / 22919MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P40 Off | 00000000:83:00.0 Off | 0 |
| N/A 26C P0 47W / 250W | 0MiB / 22919MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+