AI Applied Science- MSc/PhD Student Position – Research Lab

  • Haifa, ISRAEL
  • IT development

Job description

Your Role and Responsibilities
Student Researcher – in the challenging area of multimodal foundation models, working on tasks in the intersection of vision, audio and language modalities. This position will tackle real-world tasks in rich document understanding and speech understanding and generation. The focus of the work will be on the multidisciplinary, multimodal foundation models including training, adapting, and fine-tuning the models for a wide variety of real-world tasks.
If you're a student interested in the fields of machine learning, deep learning, and intersection of multiple disciplines of computer vision, speech and audio analysis, and natural language processing, and you're looking for a place where you will do research with academic and industrial impact, then this position is for you!
Our team develops technologies, models, algorithms, and software that make an impact on IBM products and on the world; we publish papers and issue patents based on the work we do.

Roles and Responsibilities :
The responsibilities involve solving real-world problems using cutting edge deep learning/machine learning methods, with the aim to advance the state of the art in the domain of document understanding, speech analysis and generation.
Document understanding is the ability to read documents, understand their structure and content, extract and act upon it. This is a crucial technology as business documents are key to the day-to-day operation of organizations.
Document understanding remains a research challenge that requires a multi-disciplinary perspective, spanning textual analysis, visual comprehension, layout understanding, knowledge representation, data mining and more.
Speech and Audio technologies provide the ability to understand as well as generate audio and speech. In particular, speech recognition and synthesis are key components of natural spoken interaction, which is key to for customer care by organizations. This also requires a multi-disciplinary perspective, spanning conversational and generative AI and modeling for speech, language, and audio.
The areas we are looking at include also multimodal and foundation models, image and audio understanding, data synthesis, expressive speech synthesis and tokenization.
To achieve these goals, you will collaborate with fellow team members and have access to nearly limitless compute power (GPU). The topics include, novel self-supervised learning techniques, realistic data synthesis, multimodal research, and more.

During your time at IBM you will have the opportunity to publish your work in top AI conferences and development of a prototype demonstrating new AI functionality.
Succeeding in these tasks is expected to make an important impact on the research community in these exciting fields and lead to strong publications in a leading AI venue (e.g. CVPR / ICLR / ICCV / ICASSP / Interspeech).

Location: (both are possible)
Haifa Research Lab (in the Haifa University Campus)
IBM Site in Hashahar Tower , Givataim (Near Tel Aviv Arlozorov train station)

Required Technical and Professional Expertise

• M.Sc. or Ph.D. student with knowledge in Machine Learning, Computer Vision or Speech and Audio analysis, Deep Learning.
• Strong background using modern (DL) methods, deep knowledge of the recent literature, prior ML/DL publications is an advantage.
• Strong python coding skills. Experience with PyTorch or TensorFlow is an advantage
• Team player with great social skills, willingness to collaborate
• Strong background in Deep Learning methods. Knowledge of the recent literature and being able to discuss architectural concepts – advantage.

Please add your grade sheet to your application.

Preferred Technical and Professional Expertise

