Nvidia CUDA Toolkit 3.2 .
New and Improved CUDA Libraries- CUBLAS performance improved 50% to 300% on Fermi architecture GPUs,
Last update
11 Dec. 2013
Licence
Free
OS Support
Windows
Downloads
Total: 1,045 | Last week: 2
Ranking
#4936 in
Miscellaneous
Publisher
Nvidia CUDA Toolkit 3.2 Publisher's Description
New and Improved CUDA Libraries
- CUBLAS performance improved 50% to 300% on Fermi architecture GPUs, for matrix multiplication of all datatypes and transpose variations
- CUFFT performance tuned for radix-3, -5, and -7 transform sizes on Fermi architecture GPUs, now 2x to 10x faster than MKL
- New CUSPARSE library of GPU-accelerated sparse matrix routines for sparse/sparse and dense/sparse operations delivers 5x to 30x faster performance than MKL
- New CURAND library of GPU-accelerated random number generation (RNG) routines, supporting Sobol quasi-random and XORWOW pseudo-random routines at 10x to 20x faster than similar routines in MKL
- H.264 encode/decode libraries now included in the CUDA Toolkit
CUDA Driver & CUDA C Runtime
- Support for new 6GB Quadro and Tesla products
- New support for enabling high performance Tesla Compute Cluster (TCC) mode on Tesla GPUs in Windows desktop workstations
Development Tools
- Multi-GPU debugging support for both cuda-gdb and Parallel Nsight
- Expanded cuda-memcheck support for all Fermi architecture GPUs
- NVCC support for Intel C Compiler (ICC) v11.1 on 64-bit Linux distros
- Support for debugging GPUs with more than 4GB device memory
Miscellaneous
- Support for memory management using malloc() and free() in CUDA C compute kernels
- New NVIDIA System Management Interface (nvidia-smi) support for reporting % GPU busy, and several GPU performance counters
New GPU Computing SDK Code Samples
- Several code samples demonstrating how to use the new CURAND library, including MonteCarloCURAND, EstimatePiInlineP, EstimatePiInlineQ, EstimatePiP, EstimatePiQ, SingleAsianOptionP, and randomFog
- Conjugate Gradient Solver, demonstrating the use of CUBLAS and CUSPARSE in the same application
- Function Pointers, a sample that shows how to use function pointers to implement the Sobel Edge Detection filter for 8-bit monochrome images
- Interval Computing, demonstrating the use of interval arithmetic operators using C++ templates and recursion
- Simple Printf, demonstrating best practices for using both printf and cuprintf in compute kernels
- Bilateral Filter, an edge-preserving non-linear smoothing filter for image recovery and denoising implemented in CUDA C with OpenGL rendering
- SLI with Direct3D Texture, a simple example demonstrating the use of SLI and Direct3D interoperability with CUDA C
- cudaEncode, showing how to use the NVIDIA H.264 Encoding Library using YUV frames as input
- Vflocking Direct3D/CUDA, which simulates and visualizes the flocking behavior of birds in flight
- simpleSurfaceWrite, demonstrating how CUDA kernels can write to 2D surfaces on Fermi GPUs
Windows developers should be sure to check out the new debugging and profiling features in Parallel Nsight v1.5 for Visual Studio.
Please refer to the Release Notes and Getting Started Guides for more information.
In CUDA Toolkit 3.2 and the accompanying release of the CUDA driver, some important changes have been made to the CUDA Driver API to support large memory access for device code and to enable further system calls such as malloc and free. Please refer to the CUDA Toolkit 3.2 Readiness Tech Brief for a summary of these changes.
- CUBLAS performance improved 50% to 300% on Fermi architecture GPUs, for matrix multiplication of all datatypes and transpose variations
- CUFFT performance tuned for radix-3, -5, and -7 transform sizes on Fermi architecture GPUs, now 2x to 10x faster than MKL
- New CUSPARSE library of GPU-accelerated sparse matrix routines for sparse/sparse and dense/sparse operations delivers 5x to 30x faster performance than MKL
- New CURAND library of GPU-accelerated random number generation (RNG) routines, supporting Sobol quasi-random and XORWOW pseudo-random routines at 10x to 20x faster than similar routines in MKL
- H.264 encode/decode libraries now included in the CUDA Toolkit
CUDA Driver & CUDA C Runtime
- Support for new 6GB Quadro and Tesla products
- New support for enabling high performance Tesla Compute Cluster (TCC) mode on Tesla GPUs in Windows desktop workstations
Development Tools
- Multi-GPU debugging support for both cuda-gdb and Parallel Nsight
- Expanded cuda-memcheck support for all Fermi architecture GPUs
- NVCC support for Intel C Compiler (ICC) v11.1 on 64-bit Linux distros
- Support for debugging GPUs with more than 4GB device memory
Miscellaneous
- Support for memory management using malloc() and free() in CUDA C compute kernels
- New NVIDIA System Management Interface (nvidia-smi) support for reporting % GPU busy, and several GPU performance counters
New GPU Computing SDK Code Samples
- Several code samples demonstrating how to use the new CURAND library, including MonteCarloCURAND, EstimatePiInlineP, EstimatePiInlineQ, EstimatePiP, EstimatePiQ, SingleAsianOptionP, and randomFog
- Conjugate Gradient Solver, demonstrating the use of CUBLAS and CUSPARSE in the same application
- Function Pointers, a sample that shows how to use function pointers to implement the Sobel Edge Detection filter for 8-bit monochrome images
- Interval Computing, demonstrating the use of interval arithmetic operators using C++ templates and recursion
- Simple Printf, demonstrating best practices for using both printf and cuprintf in compute kernels
- Bilateral Filter, an edge-preserving non-linear smoothing filter for image recovery and denoising implemented in CUDA C with OpenGL rendering
- SLI with Direct3D Texture, a simple example demonstrating the use of SLI and Direct3D interoperability with CUDA C
- cudaEncode, showing how to use the NVIDIA H.264 Encoding Library using YUV frames as input
- Vflocking Direct3D/CUDA, which simulates and visualizes the flocking behavior of birds in flight
- simpleSurfaceWrite, demonstrating how CUDA kernels can write to 2D surfaces on Fermi GPUs
Windows developers should be sure to check out the new debugging and profiling features in Parallel Nsight v1.5 for Visual Studio.
Please refer to the Release Notes and Getting Started Guides for more information.
In CUDA Toolkit 3.2 and the accompanying release of the CUDA driver, some important changes have been made to the CUDA Driver API to support large memory access for device code and to enable further system calls such as malloc and free. Please refer to the CUDA Toolkit 3.2 Readiness Tech Brief for a summary of these changes.
Look for Similar Items by Category
Feedback
- If you need help or have a question, contact us
- Would you like to update this product info?
- Is there any feedback you would like to provide? Click here
Popular Downloads
- Kundli 4.5
- Macromedia Flash 8 8.0
- Cool Edit Pro 2.1.3097.0
- Hill Climb Racing 1.0
- Cheat Engine 6.8.1
- Grand Theft Auto: Vice City 1.0
- Grand Auto Adventure 1.0
- Tom VPN 2.2.8
- HTML To PHP Converter 6.0.1
- Zuma Deluxe 1.0
- Netcut 2.1.4
- Windows XP Service Pack 3 Build...
- Vector on PC 1.0
- Minecraft 1.10.2
- Ulead Video Studio Plus 11
- PhotoImpression 6.5
- Street Fighter 3 1.0
- Auto-Tune Evo VST 6.0.9.2
- Iggle Pop 1.0
- C-Free 5.0