Job Summary
A company is looking for a Staff Site Reliability Engineer, EMEA.
Key Responsibilities
- Develop and maintain observability solutions using platforms like Datadog, Prometheus, and Grafana
- Lead incident management processes, including coordinating responses and troubleshooting issues
- Collaborate with product engineering teams to design reliable systems and implement monitoring strategies
Required Qualifications
- 8+ years of experience in Site Reliability Engineering or similar DevOps roles
- 6+ years of experience architecting applications for Kubernetes and managing Kubernetes infrastructure
- Strong experience with monitoring stacks like Prometheus, Grafana, and Datadog
- Experience in at least one systems programming language (Go, Rust, C, or Java)
- Expertise with Infrastructure as Code tools, such as Terraform and Helm
Comments