不同架构下的CUDA Arch以及Gencode对应关系

Fermi† Kepler† Maxwell‡ Pascal Volta Turing Ampere Hopper* Lovelace
sm_20 sm_30 sm_50 sm_60 sm_70 sm_75 sm_80 sm_90 sm_100?
sm_35 sm_52 sm_61 sm_72 sm_86
sm_37 sm_53 sm_62 sm_87

Fermi and Kepler are deprecated from CUDA 9 and 11 onwards

Maxwell is deprecated from CUDA 11.6 onwards

  • Hopper is NVIDIA’s “tesla-next” series, with a 5nm process, replacing Ampere.

Fermi cards (CUDA 3.2 until CUDA 8)

Deprecated from CUDA 9, support completely dropped from CUDA 10.

  • SM20 or SM_20, compute_30 –GeForce 400, 500, 600, GT-630.Completely dropped from CUDA 10 onwards.

Kepler cards (CUDA 5 until CUDA 10)

Deprecated from CUDA 11.

  • SM30 or SM_30, compute_30 –Kepler architecture (e.g. generic Kepler, GeForce 700, GT-730).Adds support for unified memory programmingCompletely dropped from CUDA 11 onwards.
  • SM35 or SM_35, compute_35 –Tesla K40.Adds support for dynamic parallelism.Deprecated from CUDA 11, will be dropped in future versions.
  • SM37 or SM_37, compute_37 –Tesla K80.Adds a few more registers.Deprecated from CUDA 11, will be dropped in future versions, strongly suggest replacing with a 32GB PCIe Tesla V100.

Maxwell cards (CUDA 6 until CUDA 11)

  • SM50 or SM_50, compute_50 –Tesla/Quadro M series.Deprecated from CUDA 11, will be dropped in future versions, strongly suggest replacing with a Quadro RTX 4000 or A6000.
  • SM52 or SM_52, compute_52 –Quadro M6000 , GeForce 900, GTX-970, GTX-980, GTX Titan X.
  • SM53 or SM_53, compute_53 –Tegra (Jetson) TX1 / Tegra X1, Drive CX, Drive PX, Jetson Nano.

Pascal (CUDA 8 and later)

  • SM60 or SM_60, compute_60 –Quadro GP100, Tesla P100, DGX-1 (Generic Pascal)
  • SM61 or SM_61, compute_61–GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030 (GP108), GT 1010 (GP108) Titan Xp, Tesla P40, Tesla P4, Discrete GPU on the NVIDIA Drive PX2
  • SM62 or SM_62, compute_62 – Integrated GPU on the NVIDIA Drive PX2, Tegra (Jetson) TX2

Volta (CUDA 9 and later)

  • SM70 or SM_70, compute_70 –DGX-1 with Volta, Tesla V100, GTX 1180 (GV104), Titan V, Quadro GV100
  • SM72 or SM_72, compute_72 –Jetson AGX Xavier, Drive AGX Pegasus, Xavier NX

Turing (CUDA 10 and later)

  • SM75 or SM_75, compute_75 –GTX/RTX Turing – GTX 1660 Ti, RTX 2060, RTX 2070, RTX 2080, Titan RTX, Quadro RTX 4000, Quadro RTX 5000, Quadro RTX 6000, Quadro RTX 8000, Quadro T1000/T2000, Tesla T4

Ampere (CUDA 11.1 and later)

  • SM80 or SM_80, compute_80 –NVIDIA A100 (the name “Tesla” has been dropped – GA100), NVIDIA DGX-A100
  • SM86 or SM_86, compute_86 – (from CUDA 11.1 onwards)Tesla GA10x cards, RTX Ampere – RTX 3080, GA102 – RTX 3090, RTX A2000, A3000, RTX A4000, A5000, A6000, NVIDIA A40, GA106 – RTX 3060, GA104 – RTX 3070, GA107 – RTX 3050, RTX A10, RTX A16, RTX A40, A2 Tensor Core GPU
  • SM87 or SM_87, compute_87 – (from CUDA 11.4 onwards)

“Devices of compute capability 8.6 have 2x more FP32 operations per cycle per SM than devices of compute capability 8.0. While a binary compiled for 8.0 will run as is on 8.6, it is recommended to compile explicitly for 8.6 to benefit from the increased FP32 throughput.“

https://docs.nvidia.com/cuda/ampere-tuning-guide/index.html#improved_fp32

Hopper (CUDA 12 and later)

  • SM90 or SM_90, compute_90 –NVIDIA H100 (GH100)

市面上常见的商品与版本对应关系

  • 版本52:Quadro M6000 , GeForce 900, GTX-970, GTX-980, GTX Titan X
  • 版本53:Tegra (Jetson) TX1 / Tegra X1, Drive CX, Drive PX, Jetson Nano
  • 版本60:Quadro GP100, Tesla P100, DGX-1 (Generic Pascal)
  • 版本61:GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030 (GP108), GT 1010 (GP108) Titan Xp, Tesla P40, Tesla P4, Discrete GPU on the NVIDIA Drive PX2
  • 版本62:Integrated GPU on the NVIDIA Drive PX2, Tegra (Jetson) TX2
  • 版本70:DGX-1 with Volta, Tesla V100, GTX 1180 (GV104), Titan V, Quadro GV100
  • 版本72:Jetson AGX Xavier, Drive AGX Pegasus, Xavier NX
  • 版本75:GTX/RTX Turing – GTX 1660 Ti, RTX 2060, RTX 2070, RTX 2080, Titan RTX, Quadro RTX 4000, Quadro RTX 5000, Quadro RTX 6000, Quadro RTX 8000, Quadro T1000/T2000, Tesla T4
  • 版本80:NVIDIA A100 (the name “Tesla” has been dropped – GA100), NVIDIA DGX-A100
  • 版本86:Tesla GA10x cards, RTX Ampere – RTX 3080, GA102 – RTX 3090, RTX A2000, A3000, A4000, A5000, A6000, NVIDIA A40, GA106 – RTX 3060, GA104 – RTX 3070, GA107 – RTX 3050, Quadro A10, Quadro A16, Quadro A40, A2 Tensor Core GPU