Job Summary
A company is looking for a Senior MLOps Engineer to design and scale infrastructure for AI research and product development.
Key Responsibilities
- Identify and resolve infrastructure and software bottlenecks to enhance ML job performance
- Translate research workflows into automated and scalable systems for efficient experimentation
- Develop CI/CD workflows and observability frameworks for large-scale training clusters
Required Qualifications
- BS in Computer Science, Information Systems, Computer Engineering, or equivalent experience
- 8+ years of experience in large-scale software or infrastructure systems, with 5+ years in ML platforms or MLOps
- Proven experience in designing and operating ML infrastructure for production workloads
- Expertise in distributed training frameworks and orchestration systems
- Strong programming skills in Python and at least one systems language (Go, C++, Rust)
Comments