Job Summary
A company is looking for a Senior Manager, Site Reliability Engineering - DGX Cloud.
Key Responsibilities
- Recruit, develop, and mentor a team of Site Reliability Engineers while promoting a culture of collaboration and technical excellence
- Establish SRE standard practices and drive continuous improvement in system reliability and performance
- Lead initiatives for automation across service lifecycle and oversee incident response processes
Required Qualifications
- Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related field, or equivalent experience
- 10+ years of experience in Site Reliability Engineering, DevOps, or a similar role, with at least 5 years in a leadership capacity
- Proven experience with large-scale distributed systems in a cloud environment
- Deep expertise in Kubernetes, containerization, and microservices architecture
- Extensive experience with infrastructure automation tools and proficiency in at least one high-level programming language
Comments