Job Summary
A company is looking for a Senior Site Reliability Engineer for the NIM Factory.
Key Responsibilities
- Operate a software factory that transforms AI models into deployable services across various environments
- Collaborate with development teams to enhance technical strategies and ensure service availability and performance
- Participate in on-call rotations to maintain the reliability of NVIDIA NIMs and the NIM Factory
Required Qualifications
- BS or MS in Computer Science, Computer Engineering, or equivalent experience
- 8+ years of experience as an SRE or Developer in high-performance microservices and cloud software
- Advanced system engineering skills in managing distributed microservices cloud applications
- Experience with operating containerized applications using Docker, Kubernetes, and Infrastructure as Code tools
- Proven ability to mentor teams and work with multi-functional teams across organizational boundaries
Comments