Job Summary
A company is looking for a Senior Site Reliability Engineer - Midnight.
Key Responsibilities
- Design, build, and maintain scalable systems on AWS and manage Kubernetes clusters for high availability
- Implement monitoring solutions and lead incident response efforts while collaborating with development teams to define SLOs/SLIs
- Evaluate and adopt new technologies, documenting processes and best practices for continuous improvement
Required Qualifications
- 7+ years of experience in SRE, DevOps, or a related role
- Strong programming proficiency in Python, Golang, or Javascript; Rust experience is advantageous
- Demonstrated experience with AWS, modern cloud architectures, Kubernetes/EKS, and GitOps methodologies
- Proficiency in Helm, Terraform, and CI/CD tools like Github Actions and ArgoCD
- Experience with monitoring tools such as Prometheus and familiarity with the LGTM stack
Comments