InternDistributed Inference For Multi-Modal Large Language Models Mllms With Agentic ai Orchestration H/F
Stage Cesson-Sévigné (Ille-et-Vilaine)
Job description
Description
Summary Multi-Modal Large Language Models (MLLMs) are increasingly capable of processing and reasoning over diverse input modalities such as text, images, audio, and video. However, running such models in real-time or resource-constrained environments poses significant challenges in terms of bandwidth and compute requirements. Distributing the processing of models between client and server (a "distributed computing" approach) is a promising solution. While traditional distributed inferencing has been applied successfully to DNN models, extending this paradigm to MLLMs is a novel and impactful use case. This internship aims to demonstrate the feasibility of distributed MLLMs inference approach, where MLLM components are distributed across two endpoints, coordinated through an agentic orchestration. Responsibilities The internship will be involved in the following tasks: Survey the recent advances in MLLMs, Select representative models, and a representative agentic architecture in collaboration with the Team Implement the selected models as a reference platform, Propose one or more distributed inference strategies, Implement and adapt these strategies in Python, Conduct experiments to measure key performance metrics such as latency, data size, and energy consumption Keywords: MLLM (Multi-Modal Large Language Model), Distributed inference, Python prototyping Expected Outcomes: Hands-on experience with state-of-the-art MLLMs Development of agentic based proof of concept Potential participation in one scientific paper, publications and patents Location: Rennes Mentors: Thierry Filoche (primary), Stephane Onno, Cyril Quinquis
Date de début
24 oct., 2025
Profil
Qualifications Education: Master's student in Computer Science, Artificial Intelligence, Data Science, or related field. Skills: Background in AI/ML, particularly large language models. Knowledge of multi-modal systems (text, vision, speech). Proficiency in Python programming and ML frameworks (PyTorch). Ability to conduct research and prototype efficiently. Nice to have: familiarity with distributed systems, networking, bandwidth concepts, ONNX framework
Répartition du temps de travail
Full time
Fonction
Informatique_syst_info
Durée (Mois)
6
Formation
RJ/Qualif/Ingenieur_B5
Secteur
Ind_hightech_telecom