Job Summary
A company is looking for a Principal Engineer, Production Operations.
Key Responsibilities
- Design, build, and maintain highly reliable, scalable, and performant cloud infrastructure and systems
- Define and implement reliability standards, including SLIs/SLOs and error budgets
- Lead incident response efforts and drive improvements through postmortems and knowledge sharing
Required Qualifications
- Deep technical expertise in Site Reliability Engineering (SRE) and cloud infrastructure
- Extensive experience with AWS and building secure, scalable systems
- Strong incident response skills with experience in leading critical outages
- Ability to influence and collaborate across engineering, product, and security teams
- Experience with automation and infrastructure-as-code practices
Comments