Parallel Programming Expert (CUDA/AVX)
CDD Grenoble (Isère) IT development
Job description
About
About EXXA
At EXXA, we are building the most cost-efficient, high-throughput AI infrastructure for large-scale, asynchronous workloads. Our mission is to balance Gen-AI demand and processing supply by leveraging idle GPUs, optimizing batch inference, and pushing AI models' inference efficiency.
If you are passionate about open-source AI, obsessed with performance, and love tackling complex technical challenges, we want to hear from you!
Who we are
We are an early-stage, fast-growing startup, backed by top tech investors and part of Station F’s Future 40 program. Our founding team has deep expertise in AI research and infrastructure, and we are on a mission to make open-source AI more accessible by championing delayed processing for massive workloads. Our unique approach dramatically reduces waste in Gen-AI, unlocking new possibilities for developers and companies alike.
Why you should join us
Technical innovation
We are tackling massive technical challenges to make Gen-AI inference infrastructure more efficient and push throughput-optimized computing.
Remote first
We are a fully distributed team. Work from anywhere in European timezones.
Competitive compensation and benefits
Competitive salary
Early-stage stock options
Private health insurance
30+ paid holidays
Top notch hardware and equipment
Backed by the best
We are funded by leading VCs and top business angels (announcement coming soon).
Job Description
EXXA is hiring a Parallel Programming Expert (CUDA/AVX) to join the team developing EXXA inference engine, focusing on batch processing and throughput rather than low-latency constraints.
Key responsibilities:
·
Contribute to the development of EXXA inference engine
·
Profile and optimize the inference engine
·
Design and implement efficient inference kernels for GPU and CPU
·
Benchmark and validate performance improvements
Preferred Experience
Qualifications:
·
Proven expertise in parallel programming, using CUDA and/or SIMD instructions
·
Experience with performance optimization and profiling
·
Familiarity with Triton kernel language and/or MLIR/XLA intermediate representation is a plus
·
Proficiency in C++ or Rust
·
Knowledge of Python ML stack (PyTorch, HuggingFace, etc.)
·
2-3+ years of experience in high-performance computing or similar field
Recruitment Process
Expect to have at least:
·
An intro call with one of our founders
·
A technical interview
·
A final meeting with the team
Additional Information
· Contract Type: Full-Time
· Location: Paris, Grenoble, Sophia-Antipolis
· Experience: > 2 years
· Possible full remote