Job Summary
A company is looking for a Senior Site Reliability Engineer.
Key Responsibilities
- Engineer and operate deployment and platform services, focusing on Kubernetes Operators
- Manage and optimize core infrastructure, ensuring reliability and performance
- Lead incident management and on-call response, improving documentation and disaster recovery plans
Required Qualifications
- Strong background in software engineering with expertise in large-scale distributed systems
- Expertise in Kubernetes and cloud platforms (e.g., AWS, GCP, Azure)
- Proficiency in programming/scripting languages such as Go, Python, or Bash
- Hands-on experience with monitoring and alerting systems (e.g., Prometheus, Grafana)
- Experience with Infrastructure as Code (IaC) tools like Terraform or Ansible
Comments