Image Editing Of Complex Visual Scene Via Natural Language H/F
Stage FRANCE
Job description
Description
This internship focuses on the emerging field of natural language-guided image editing, specifically targeting the generation and modification of complex scenes based on verbal descriptions. The candidate will work on designing and implementing novel methods that can interpret natural language to manipulate or generate detailed images representing multifaceted scenarios (e.g., crowd scenes, cityscapes, interactions between multiple objects). This project presents several key challenges, including: -Scene Complexity: Managing multiple objects and their relationships in a scene adds significant complexity. The goal is to maintain coherence and accuracy in the edited images, even when the scenes described involve intricate interactions between various elements. -Multimodal Integration: Successfully combining linguistic and visual inputs to obtain visual outputs, is a complex problem requiring seamless interaction between natural language processing (NLP) and computer vision models. The objectives of this internship are to: -Investigate current methods for natural language-based image generation and editing of complex scenes (in particular for the numerality and geometric positioning aspects); -Develop an innovative approach for editing complex scenes using natural language descriptions; -Demonstrate significant improvements in the accuracy and detail of generated images; -Contribute to academic research through potential publications and/or patents.
Date de début
07 oct., 2025
Profil
-Students in their 5th year of studies (M2) -Computer vision skills -Machine learning skills (deep learning, LLM, VLM, generative AI) -Python proficiency in a deep learning framework (especially PyTorch or TensorFlow)
Secteur
Ind_hightech_telecom