Job Summary
A company is looking for a Site Reliability Engineer.
Key Responsibilities
- Ensure high reliability and availability while meeting SLAs, SLOs, and SLIs
- Monitor systems for issues and respond to incidents to maintain high availability
- Drive incident resolution and process improvements to minimize downtime
Required Qualifications
- 5 to 10 years of experience as a Site Reliability Engineer or similar role in a large-scale production environment
- Proficiency in scripting languages such as Python and Bash; understanding of Go and PHP is a plus
- Deep knowledge of monitoring systems like Datadog, Prometheus, and Grafana
- Experience with Docker, Kubernetes, and infrastructure automation tools like Terraform
- Familiarity with Linux-based infrastructures and system administration
Comments