Friday, October 2, 2009

Fermi technology unveiled

nVidia has revealed details of it's new CUDA platform - Fermi. Development of Fermi was greatly influenced by feedback from development of G80 and GT200 chipsets. Main areas of focus for development of Fermi were:

  • Improve Double Precision Performance—while single precision floating point performance was on the order of ten times the performance of desktop CPUs, some GPU computing applications desired more double precision performance as well.
  • ECC support—ECC allows GPU computing users to safely deploy large numbers of GPUs in datacenter installations, and also ensure data-sensitive applications like medical imaging and financial options pricing are protected from memory errors.
  • True Cache Hierarchy—some parallel algorithms were unable to use the GPU’s shared memory, and users requested a true cache architecture to aid them.
  • More Shared Memory—many CUDA programmers requested more than 16 KB of SM shared memory to speed up their applications.
  • Faster Context Switching—users requested faster context switches between application programs and faster graphics and compute interoperation.
  • Faster Atomic Operations—users requested faster read-modify-write atomic operations for their parallel algorithms.
With these goal in mind nVidia developed Fermi as significant improvement of predecessor. The table below illustrates these differences.

GPU
G80
GT200
Fermi
Transistors
681 Million
1.4 Billion
3 Billion
CUDA cores
None
240
512
Double precision floating point capability
None
30 FMA ops/clock
256 FMA ops/clock
Single Precision Floating
Point Capability
128 MAD ops/clock
240 MAD ops/clock
512 FMA ops/clock
Warp schedulers (per SM)
1
1
2
Special Function Units
(SFUs) / SM
2
2
4
Shared Memory (per SM)
16 Kb
16 Kb
Config 48KB or 16KB
L1 Cache (per SM)
None
None
Config 48 KB or 16 KB
L2 Cache (per SM)
None
None
768 KB
ECC Memory Support
No
No
Yes
Concurrent Kernels
No
No
Up to 16
Load/Store Address Width
32 bit
32 bit
64 bit

In simpler words some of the features of Fermi GPU include 384 bit memory interface supporting up to a 6 GB of GDDR5 DRAM memory. Double precision calculation performance is about 4x faster, fluid collision calculation in PhysX for convex shapes is 2.7x faster, application context switching is 10x faster.

More details on Fermi can be found in Fermi white paper.

To make things easier for developers nVidia is working on Nexus – integration of CUDA C, OpenCL and DirectCompute into Visual Studio. It is expected to be released very soon – this month. In the mean time here is a video of functionality that it will provide.

No comments:

Post a Comment