Offers “Station F”

Expires soon Station F

Parallel Programming Expert (CUDA/AVX)

  • CDD
  • Grenoble (Isère)
  • IT development

Job description

About

About EXXA

At EXXA, we are building the most cost-efficient, high-throughput AI infrastructure for large-scale, asynchronous workloads. Our mission is to balance Gen-AI demand and processing supply by leveraging idle GPUs, optimizing batch inference, and pushing AI models' inference efficiency.
If you are passionate about open-source AI, obsessed with performance, and love tackling complex technical challenges, we want to hear from you!

Who we are

We are an early-stage, fast-growing startup, backed by top tech investors and part of Station F’s Future 40 program. Our founding team has deep expertise in AI research and infrastructure, and we are on a mission to make open-source AI more accessible by championing delayed processing for massive workloads. Our unique approach dramatically reduces waste in Gen-AI, unlocking new possibilities for developers and companies alike.

Why you should join us

Technical innovation
We are tackling massive technical challenges to make Gen-AI inference infrastructure more efficient and push throughput-optimized computing.

Remote first
We are a fully distributed team. Work from anywhere in European timezones.

Competitive compensation and benefits
Competitive salary
Early-stage stock options
Private health insurance
30+ paid holidays
Top notch hardware and equipment

Backed by the best
We are funded by leading VCs and top business angels (announcement coming soon).

Job Description

EXXA is hiring a Parallel Programming Expert (CUDA/AVX) to join the team developing EXXA inference engine, focusing on batch processing and throughput rather than low-latency constraints.

Key responsibilities:

· 
Contribute to the development of EXXA inference engine

· 
Profile and optimize the inference engine

· 
Design and implement efficient inference kernels for GPU and CPU

· 
Benchmark and validate performance improvements

Preferred Experience

Qualifications:

· 
Proven expertise in parallel programming, using CUDA and/or SIMD instructions

· 
Experience with performance optimization and profiling

· 
Familiarity with Triton kernel language and/or MLIR/XLA intermediate representation is a plus

· 
Proficiency in C++ or Rust

· 
Knowledge of Python ML stack (PyTorch, HuggingFace, etc.)

· 
2-3+ years of experience in high-performance computing or similar field

Recruitment Process

Expect to have at least:

· 
An intro call with one of our founders

· 
A technical interview

· 
A final meeting with the team

Additional Information

·  Contract Type: Full-Time
·  Location: Paris, Grenoble, Sophia-Antipolis
·  Experience: > 2 years
·  Possible full remote

Make every future a success.
  • Job directory
  • Business directory