Job Summary
A company is looking for a Machine Learning Engineer to optimize inference performance for AI models on their platform.
Key Responsibilities
- Optimize inference performance for various AI models to minimize latency and maximize throughput
- Continuously experiment to achieve industry-leading performance on the platform
- Impact the performance of applications serving millions of users worldwide
Required Qualifications
- 1+ years of experience with state-of-the-art inference stacks (e.g., PyTorch, TensorRT, vLLM)
- Familiarity with modern AI workflows, including ComfyUI and LoRA adaptors
- Deep understanding of model compilation, quantization, and serving architectures
- Experience with GPU architectures and kernel-level optimizations
- Proficiency in programming with CUDA, Triton, or similar low-level accelerator frameworks
Comments