Job Summary
A company is looking for a Senior Research Engineer - Multimodal & Video Foundation Model (Remote).
Key Responsibilities
- Pioneer multimodal and video-centric research to create usable prototypes and scalable systems
- Design and implement novel AI architectures for multimodal language models integrating various modalities
- Engineer scalable training and inference pipelines optimized for large-scale multimodal datasets
Required Qualifications
- Bachelor's degree in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience
- Expertise in Python & Pytorch with experience in the full development pipeline
- Experience with large-scale text data and/or interleaved data spanning audio, video, image, and/or text
- Direct hands-on experience in developing or benchmarking LLMs, Vision Language Models, Audio Language Models, or generative video models
- PhD in a relevant field is a plus
Comments