News and working notes

all | popular | tags | rss

4-day CUDA Course at Airbus Defence and Space

Applied Parallel Computing LLC has delivered the 4-day CUDA Course at Airbus Defence and Space, Ulm, Germany.

Continue Reading »

Use CUDA 7.0 NVRTC with Thrust

Rintime Compilation (NVRTC) introduced in CUDA 7.0 allows to dynamically compile CUDA kernels during program execution (see example). This functionality allows to...

Continue Reading »

Get extra 8% perf in bilinear interpolation on GPU using __restrict__ keyword

Starting from GK110 (Tesla Kepler), “const restrict” annotation on kernel argument has an extra GPU-specific meaning: accesses to that argument should go through ...

Continue Reading »
Software Engineering, CUDA

Thrust/CUDA tip: reuse temporary buffer across multiple transforms

Thrust is a very handy STL-like template library for rapid data processing on GPUs.

Continue Reading »

On-the-fly modification of LLVM IR code of CUDA sources

Largely thanks to LLVM, in recent years we’ve seen a significant increase of interest to domain-specific compilation tools research & development. With the re...

Continue Reading »
Software Engineering, LLVM

5-day GPU computing workshop at TÜBİTAK UZAY

Applied Parallel Computing LLC has delivered the GPU Computing Workshop at Space Technologies Research Institute (TÜBİTAK UZAY), Ankara, Turkey. We would like to ...

Continue Reading »

How to find CUDA's version of LLVM backend

It is well-known that CUDA toolkit uses LLVM backend, but the used version number is not shown. We can use gdb and LLVM API function to print the version string:

Continue Reading »
Software Engineering, LLVM

NVIDIA Visual Profiler allows to connect 64-bit Linux server from 32-bit Windows

In CUDA 6.0 release an extremely handy feature has been added to Visual Profiler: support for remote profiling. This means that you can run the profiler GUI from ...

Continue Reading »
Software Engineering, CUDA

Efficient CPU-GPU data transfers, CUDA 6.0 Unified Virtual Memory

Juraj Kardoš – University of Lugano summer intern and our collaborator – presents a talk on efficient CPU-GPU data transfers and CUDA 6.0 Unified Virtual Memory o...

Continue Reading »
Software Engineering, CUDA

Calling CUDA device function from OpenACC Fortran kernel

OpenACC is known to be a fast method of developing quite efficient GPU-enabled applications. It is also possible to mix CUDA kernels and libraries with OpenACC ke...

Continue Reading »
« Newer Posts Page 5 of 10 Older Posts »