Job Summary
A company is looking for a Site Reliability Engineer Lead.
Key Responsibilities
- Design, implement, and maintain observability and monitoring systems for application stability and performance
- Establish and own service level objectives (SLOs), SLIs, and SLAs across key systems
- Drive incident management best practices and lead synthetic and load testing initiatives
Required Qualifications
- 5+ years of experience in SRE, DevOps, or Infrastructure Engineering, with at least 2+ years in a lead role
- Proven experience scaling observability platforms and implementing SRE principles
- Deep experience with Prometheus, PromQL, Grafana, and familiarity with Google Cloud Platform (GCP)
- A track record of creating robust incident response and postmortem practices
- Ability to plan for scale and prioritize reliability across engineering teams
Comments