环境配置
kernel-3.10.0-1160.119.1.el7.x86_64
kernel-devel-3.10.0-1160.119.1.el7.x86_64
CentOS 7.9.2009
Linux x64 (AMD64/EM64T) Display Driver 550.107.02
GeForce RTX 2080 Ti
内核版本测试过5.4.227,6.9.7都不能安装驱动,降级回3.10.0就可以了
查看显卡
1 | lspci | grep -i nvidia |
d8:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] (rev a1)
d8:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
d8:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Host Controller (rev a1)
d8:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C UCSI Controller (rev a
禁用显卡驱动
1 | # 查看是否存在 |
1 | # 应用更改 |
1 | # 删除 |
下载驱动
地址:https://www.nvidia.cn/Download/index.aspx?lang=cn
安装驱动
1 | # 执行权限 |
-no-x-check: 安装驱动时不检查X服务,非必需,已经禁用图形界面
-no-opengl-files: 只安装驱动文件,不安装OpenGL文件
-no-nouveau-check:安装驱动时禁用nouveau,非必需,已经禁用nouveau
--kernel-source-path:内核源码包位置,安装时会修改源码包,不带该参数,默认是当前内核
验证驱动
安装CUDA Toolkit
版本信息:1. CUDA 12.6 Release Notes — Release Notes 12.6 documentation (nvidia.com)
对应驱动版本
本次安装CentOS 7 v12.4 GA:CUDA Toolkit 12.4 Downloads | NVIDIA Developer
1 | wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run |
已知问题
Installation failed. See log at /var/log/cuda-installer.log for details.
排查思路
解决方案:
1 | # 查看日志 |
1 | [INFO]: Initializing menu |
安装nvidia驱动失败,继续根据提示查看日志
1 | cat /var/log/nvidia-installer.log |
1 | Using built-in stream user interface |
2843进程正在使用驱动,kill掉这个进程重新安装CUDA
问题二
1 | ERROR: The installation was canceled due to the availability or presence of an alternate driver installation. Please see /var/log/nvidia-installer.log for more details. |
CUDA版本低于NVIDIA驱动版本
验证CUDA驱动
1 | /usr/local/cuda/bin/nvcc --version |
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Tue_Feb_27_16:19:38_PST_2024 Cuda compilation tools, release 12.4, V12.4.99 Build cuda_12.4.r12.4/compiler.33961263_0