Ampere (microarchitecture)

Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to the Volta architecture, officially announced on May 14, 2020. It is named after French mathematician and physicist André-Marie Ampère.[2][3] It is unknown whether Ampere will be featured in Nvidia's expected GeForce RTX 30 family of cards which may be released in Q4 2020.[1]

Nvidia Ampere
Fabrication processTSMC 7 nm (FinFET)
History
Predecessor
SuccessorHopper

Details

Architectural improvements of the Ampere architecture include the following:

  • CUDA Compute Capability 8.0
  • TSMC's 7 nm FinFET process
  • Third-generation Tensor Cores with FP16, bfloat16, TensorFloat-32 (TF32) and FP64 support and sparsity acceleration[4]
  • High Bandwidth Memory 2 (HBM2)
  • NVLink 3.0 (50Gbps per pair)[4]
  • PCI Express 4.0 with SR-IOV support
  • Multi-Instance GPU (MIG) virtualization & GPU partitioning feature
  • PureVideo Feature Set K hardware video decoding

A100 accelerator and DGX A100

Announced and released on May 14, 2020 was the Ampere-based A100 accelerator.[4] The A100 features 19.5 teraflops of FP32 performance, 6912 CUDA cores, 40GB of graphics memory, and 1.6TB/s of graphics memory bandwidth.[1] The A100 accelerator was initially available only in the 3rd generation of DGX server, including 8 A100s.[4] Also included in the DGX A100 is 15TB of PCIe gen 4 NVMe storage,[1] two 64-core AMD Rome 7742 CPUs, 1 TB of RAM, and Mellanox-powered HDR InfiniBand interconnect. The initial price for the DGX A100 was $199,000.[4]

Comparison of accelerators used in DGX:[4]

Accelerator
A100
V100
P100
ArchitectureFP32 CUDA CoresBoost ClockMemory ClockMemory Bus WidthMemory BandwidthVRAMSingle PrecisionDouble PrecisionINT8 TensorFP16 Tensorbfloat16 TensorTensorFloat-32(TF32) TensorFP64 TensorInterconnectGPUGPU Die SizeTransistor CountTDPManufacturing Process
Ampere69121410MHz2.4Gbps HBM25120-bit1555GB/sec40GB19.5 TFLOPs9.7 TFLOPs624 TOPs312 TFLOPs312 TFLOPs156 TFLOPs19.5 TFLOPS600GB/secGA100826mm254.2B400WTSMC 7nm N7
Volta51201530MHz1.75Gbps HBM24096-bit900GB/sec16GB/32GB15.7 TFLOPs7.8 TFLOPsN/A125 TFLOPsN/AN/AN/A300GB/secGV100815mm221.1B300W/350WTSMC 12nm FFN
Pascal35841480MHz1.4Gbps HBM24096-bit720GB/sec16GB10.6 TFLOPs5.3 TFLOPsN/AN/AN/AN/AN/A160GB/secGP100610mm215.3B300WTSMC 16nm FinFET


References

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.