Job Summary
A company is looking for a Research Scientist / Engineer - Multimodal Capabilities.
Key Responsibilities
- Identify capability gaps and research solutions
- Design datasets, experiments, and methodologies to improve model capabilities across vision, audio, and language
- Develop evaluation frameworks and benchmarking approaches for multimodal AI capabilities
Required Qualifications
- Strong programming skills in Python and PyTorch
- Experience with multimodal data processing pipelines and large-scale dataset curation
- Understanding of computer vision, audio processing, and/or natural language processing techniques
- Preferred expertise in working with interleaved multimodal data
- Preferred hands-on experience with Vision Language Models, Audio Language Models, or generative video models
Comments