Gabriel Dax
Machine Learning Research Engineer — CUDA · TensorRT · Distributed Training
I build and optimize deep learning systems end-to-end: from custom CUDA kernels and multi-node distributed training (32 GPUs / DDP / NCCL) to production inference with TensorRT. Currently at Fraunhofer IIS (Munich), working on industrial defect detection, model compression, and vision-language models. PhD from TU Munich — Algorithmic Information Theory in Spatial ML.
CUDA
C++20
PyTorch DDP
TensorRT
NCCL
NVIDIA DALI
Slurm HPC
Model Compression
CLIP / SAM