Zerg's rumble: Fermi technology unveiled

nVidia has revealed details of it's new CUDA platform - Fermi. Development of Fermi was greatly influenced by feedback from development of G80 and GT200 chipsets. Main areas of focus for development of Fermi were:

Improve Double Precision Performance—while single precision floating point performance was on the order of ten times the performance of desktop CPUs, some GPU computing applications desired more double precision performance as well.
ECC support—ECC allows GPU computing users to safely deploy large numbers of GPUs in datacenter installations, and also ensure data-sensitive applications like medical imaging and financial options pricing are protected from memory errors.
True Cache Hierarchy—some parallel algorithms were unable to use the GPU’s shared memory, and users requested a true cache architecture to aid them.
More Shared Memory—many CUDA programmers requested more than 16 KB of SM shared memory to speed up their applications.
Faster Context Switching—users requested faster context switches between application programs and faster graphics and compute interoperation.
Faster Atomic Operations—users requested faster read-modify-write atomic operations for their parallel algorithms.

With these goal in mind nVidia developed Fermi as significant improvement of predecessor. The table below illustrates these differences.

GPU	G80	GT200	Fermi
Transistors	681 Million	1.4 Billion	3 Billion
CUDA cores	None	240	512
Double precision floating point capability	None	30 FMA ops/clock	256 FMA ops/clock
Single Precision Floating Point Capability	128 MAD ops/clock	240 MAD ops/clock	512 FMA ops/clock
Warp schedulers (per SM)	1	1	2
Special Function Units (SFUs) / SM	2	2	4
Shared Memory (per SM)	16 Kb	16 Kb	Config 48KB or 16KB
L1 Cache (per SM)	None	None	Config 48 KB or 16 KB
L2 Cache (per SM)	None	None	768 KB
ECC Memory Support	No	No	Yes
Concurrent Kernels	No	No	Up to 16
Load/Store Address Width	32 bit	32 bit	64 bit

In simpler words some of the features of Fermi GPU include 384 bit memory interface supporting up to a 6 GB of GDDR5 DRAM memory. Double precision calculation performance is about 4x faster, fluid collision calculation in PhysX for convex shapes is 2.7x faster, application context switching is 10x faster.

More details on Fermi can be found in Fermi white paper.

To make things easier for developers nVidia is working on Nexus – integration of CUDA C, OpenCL and DirectCompute into Visual Studio. It is expected to be released very soon – this month. In the mean time here is a video of functionality that it will provide.

Friday, October 2, 2009

Fermi technology unveiled

No comments:

Post a Comment