Job Summary
A company is looking for a Senior Site Reliability Engineer (Observability & Resilience).
Key Responsibilities
- Design and implement observability patterns, including metrics, logging, tracing, and alerting
- Build internal tooling and dashboards to provide real-time system insights
- Define and maintain SLIs and SLOs, establishing best practices for alert tuning and incident response
Required Qualifications
- At least 5 years in an SRE, DevOps, or observability-focused role
- Experience designing systems for high availability and disaster recovery
- Deep experience with observability tools such as Grafana, Prometheus, and Datadog
- Strong proficiency with Terraform and infrastructure-as-code workflows
- Passion for enabling product engineers through training and collaboration
Comments