Job Summary
A company is looking for a Model Performance Engineer to optimize inference performance for AI models on their platform.
Key Responsibilities
- Optimize inference performance by minimizing latency and maximizing throughput
- Experiment continuously to achieve industry-leading performance for various models
- Impact the performance of applications serving millions of users globally
Required Qualifications
- Experience with state-of-the-art inference stacks such as PyTorch, TensorRT, or vLLM
- Open to candidates with any level of experience, including new graduates
- Ability to work in a fast-paced environment and adapt to new challenges
- Willingness to work in-person in New York City or remotely if exceptionally qualified
- Visa sponsorship available for qualified candidates
Comments