Job Summary
A company is looking for a Site Reliability Engineer 4.
Key Responsibilities
- Manage system availability, health, and service levels of large-scale cloud infrastructure
- Proactively monitor, diagnose, and analyze failures, providing support for software engineers in debugging production issues
- Own the entire lifecycle of incidents, including reporting, analyzing, handling, and closing incidents
Required Qualifications
- Bachelor's degree in Computer Science or Computer Engineering or equivalent
- Minimum 5 years of DevOps/SRE experience
- 3 years' experience working with AWS and/or GCP
- Technical experience with EC2, IAM, S3, Kubernetes, Jenkins, Prometheus, and Linux
- General understanding of distributed systems and data management technologies
Comments