Job Summary
A company is looking for a Member of Technical Staff - LLM Inference.
Key Responsibilities
- Drive breakthroughs in LLM inference optimization with structured generation
- Deploy and optimize inference engines to enhance performance and reduce latency
- Collaborate in a remote environment to innovate and improve AI systems
Required Qualifications
- Proven experience with inference engines like vLLM, SGLang, or TensorRT
- Hands-on knowledge of NVIDIA GPU architecture, including CUDA
- Experience with distributed inference and low-latency communication
- Background in LLM MLOps, including monitoring and scaling inference services
- Proficiency in Python and familiarity with containerization technologies
Comments