Job Summary
A company is looking for a Lead Site Reliability Engineer to enhance platform resilience and operational excellence.
Key Responsibilities
- Design and implement internal developer tools and observability systems
- Lead the adoption of modern DevOps practices across engineering teams
- Develop automated systems for alerting, diagnostics, and incident response
Required Qualifications
- 5+ years of experience as a Site Reliability Engineer focused on reliability or infrastructure
- Hands-on experience with observability platforms, preferably Datadog
- Proven track record in incident response and management protocols
- Strong troubleshooting skills in complex, distributed environments
Comments