Parallel Programming Expert (CUDA/AVX) - Station F - Grenoble

Job description

About

About EXXA

At EXXA, we are building the most cost-efficient, high-throughput AI infrastructure for large-scale, asynchronous workloads. Our mission is to balance Gen-AI demand and processing supply by leveraging idle GPUs, optimizing batch inference, and pushing AI models' inference efficiency.
If you are passionate about open-source AI, obsessed with performance, and love tackling complex technical challenges, we want to hear from you!

Who we are

We are an early-stage, fast-growing startup, backed by top tech investors and part of Station F’s Future 40 program. Our founding team has deep expertise in AI research and infrastructure, and we are on a mission to make open-source AI more accessible by championing delayed processing for massive workloads. Our unique approach dramatically reduces waste in Gen-AI, unlocking new possibilities for developers and companies alike.

Why you should join us

Technical innovation
We are tackling massive technical challenges to make Gen-AI inference infrastructure more efficient and push throughput-optimized computing.

Remote first
We are a fully distributed team. Work from anywhere in European timezones.

Competitive compensation and benefits
Competitive salary
Early-stage stock options
Private health insurance
30+ paid holidays
Top notch hardware and equipment

Backed by the best
We are funded by leading VCs and top business angels (announcement coming soon).

Job Description

EXXA is hiring a Parallel Programming Expert (CUDA/AVX) to join the team developing EXXA inference engine, focusing on batch processing and throughput rather than low-latency constraints.

Key responsibilities:

·
Contribute to the development of EXXA inference engine

·
Profile and optimize the inference engine

·
Design and implement efficient inference kernels for GPU and CPU

·
Benchmark and validate performance improvements

Preferred Experience

Qualifications:

·
Proven expertise in parallel programming, using CUDA and/or SIMD instructions

·
Experience with performance optimization and profiling

·
Familiarity with Triton kernel language and/or MLIR/XLA intermediate representation is a plus

·
Proficiency in C++ or Rust

·
Knowledge of Python ML stack (PyTorch, HuggingFace, etc.)

·
2-3+ years of experience in high-performance computing or similar field

Recruitment Process

Expect to have at least:

·
An intro call with one of our founders

·
A technical interview

·
A final meeting with the team

Additional Information

· Contract Type: Full-Time
· Location: Paris, Grenoble, Sophia-Antipolis
· Experience: > 2 years
· Possible full remote

Offers “Station F”

Job description