Job Summary
A company is looking for a Senior Site Reliability Engineer, APAC.
Key Responsibilities
- Develop and maintain observability solutions using platforms like Datadog, Prometheus, and Grafana
- Lead incident management efforts, coordinating responses and troubleshooting issues
- Collaborate with product engineering teams to architect reliable systems and implement monitoring strategies
Required Qualifications
- 5+ years of experience in Site Reliability Engineering or similar DevOps roles
- 5+ years of hands-on experience with Kubernetes and managing Kubernetes infrastructure
- Strong experience with modern monitoring stacks including Prometheus, Grafana, and Datadog
- Proficiency in at least one systems programming language, such as Go, Rust, C, or Java
- Experience with Infrastructure as Code tools, like Terraform and Helm
Comments