Job Summary
A company is looking for a Principal Site Reliability Engineer to lead the design and implementation of resilient systems remotely.
Key Responsibilities:
- Lead the implementation of observability and monitoring standards, including SLIs, SLOs, and error budgets
- Design and execute resiliency tests and disaster recovery exercises to identify and mitigate system weaknesses
- Enhance CI/CD pipelines with automated performance testing and drive cloud adoption strategies focused on resiliency
Required Qualifications:
- 10+ years of experience in software engineering, DevOps, or SRE roles, with 3+ years in a principal or lead capacity
- 5+ years of experience with CI/CD tooling such as Jenkins or GitHub Actions
- 5+ years of experience with container orchestration in cloud platforms
- 3+ years of expertise in observability and monitoring tools
- 3+ years of experience with chaos engineering, disaster recovery planning, and performance testing
Comments