- Improve Double Precision Performance—while single precision floating point performance was on the order of ten times the performance of desktop CPUs, some GPU computing applications desired more double precision performance as well.
- ECC support—ECC allows GPU computing users to safely deploy large numbers of GPUs in datacenter installations, and also ensure data-sensitive applications like medical imaging and financial options pricing are protected from memory errors.
- True Cache Hierarchy—some parallel algorithms were unable to use the GPU’s shared memory, and users requested a true cache architecture to aid them.
- More Shared Memory—many CUDA programmers requested more than 16 KB of SM shared memory to speed up their applications.
- Faster Context Switching—users requested faster context switches between application programs and faster graphics and compute interoperation.
- Faster Atomic Operations—users requested faster read-modify-write atomic operations for their parallel algorithms.
GPU | G80 | GT200 | Fermi |
Transistors | 681 Million | 1.4 Billion | 3 Billion |
CUDA cores | None | 240 | 512 |
Double precision floating point capability | None | 30 FMA ops/clock | 256 FMA ops/clock |
Single Precision Floating Point Capability | 128 MAD ops/clock | 240 MAD ops/clock | 512 FMA ops/clock |
Warp schedulers (per SM) | 1 | 1 | 2 |
Special Function Units (SFUs) / SM | 2 | 2 | 4 |
Shared Memory (per SM) | 16 Kb | 16 Kb | Config 48KB or 16KB |
L1 Cache (per SM) | None | None | Config 48 KB or 16 KB |
L2 Cache (per SM) | None | None | 768 KB |
ECC Memory Support | No | No | Yes |
Concurrent Kernels | No | No | Up to 16 |
Load/Store Address Width | 32 bit | 32 bit | 64 bit |
In simpler words some of the features of Fermi GPU include 384 bit memory interface supporting up to a 6 GB of GDDR5 DRAM memory. Double precision calculation performance is about 4x faster, fluid collision calculation in PhysX for convex shapes is 2.7x faster, application context switching is 10x faster.
More details on Fermi can be found in Fermi white paper.
To make things easier for developers nVidia is working on Nexus – integration of CUDA C, OpenCL and DirectCompute into Visual Studio. It is expected to be released very soon – this month. In the mean time here is a video of functionality that it will provide.
No comments:
Post a Comment