LLMs for specifying data sharing policies H/F
CDD Palaiseau (Essonne) IT development
Job description
Détail de l'offre
Informations générales
Entité de rattachement
Le CEA est un acteur majeur de la recherche, au service des citoyens, de l'économie et de l'Etat.Il apporte des solutions concrètes à leurs besoins dans quatre domaines principaux : transition énergétique, transition numérique, technologies pour la médecine du futur, défense et sécurité sur un socle de recherche fondamentale. Le CEA s'engage depuis plus de 75 ans au service de la souveraineté scientifique, technologique et industrielle de la France et de l'Europe pour un présent et un avenir mieux maîtrisés et plus sûrs.
Implanté au cœur des territoires équipés de très grandes infrastructures de recherche, le CEA dispose d'un large éventail de partenaires académiques et industriels en France, en Europe et à l'international.
Les 20 000 collaboratrices et collaborateurs du CEA partagent trois valeurs fondamentales :
• La conscience des responsabilités
• La coopération
• La curiosité
Référence
2024-33440Description du poste
Domaine
Sciences pour l'ingénieur
Contrat
CDD
Intitulé de l'offre
LLMs for specifying data sharing policies H/F
Statut du poste
Cadre
Durée du contrat (en mois)
18
Description de l'offre
Developing physical or digital systems is a complex process involving both technical and human challenges. The first step is to give shape to ideas by drafting specifications for the system to come. Usually written in natural language by business analysts, these documents are the key that bind all stakeholders for the duration of the project, making it easier to share and understand what needs to be done. Requirements engineering proposes various techniques (reviews, modeling, formalization, etc.) to regulate this process and improve the quality (consistency, completeness, etc.) of the documents produced, with the aim of detecting and correcting defects even before system implementation.
In the field of requirements engineering, the recent arrival of very large model neural networks (LLM) has the potential to be a “game changer”. We propose to support the analyst by working around specifications on the data part age. The idea is to be able to model data sharing policies (ODRL) from natural text. The tool will exploit an AI transformer/LLM (such as ChatGPT or Lama) combined with rigorous analysis and consulting methods. It will propose options for rewriting requirements in controlled languages inspired by INCOSE or EARS standards, analyze the results produced by the LLM, and provide an audit on the quality of the model obtained.
More specifically, LLMs are particularly promising for the following uses:
- Automatically transforming unstructured requirements into requirements formatted in structured models such as EARS or user stories.
- Classify requirements: behavioral, non-functional, etc.
- flag ambiguities, inconsistencies or potential violations on the basis of predefined validation heuristics.
LLMs also have limitations that need to be taken into account in the context of requirements engineering: hallucination, non-determinism, algorithmic biases and limited generalization.
As part of the laboratory's “Intelligent Requirements” team, the candidate's work will involve :
- Determine schemas or a controlled language to represent the ODRL model.
- Determine the effectiveness of different techniques and formalisms, such as NLP or Blue metric inspiration, to avoid hallucinations during rewriting.
- Analyze, manage or generate training data for LLMs.
- Configure and pilot one or more LLMs using the most effective techniques for improving the consistency and completeness of data-sharing policies.
- Develop the software tools required for the above tasks.
Desired profile
Profil du candidat
Doctorat
Connaissance en Java, python, Eclipse EMF, Node JS, REACT