Job Summary
A company is looking for a Principal Site Reliability Engineer - Cloud (Remote).
Key Responsibilities
- Champion and implement application and infrastructure monitoring and alerting to ensure system availability, performance, and scalability
- Evaluate, prototype, and integrate the latest tools and technologies into work processes
- Participate in on-call duties and lead the triage and root cause analysis of production incidents
Required Qualifications
- BS in Computer Science or equivalent work experience
- 8+ years' experience writing software in modern languages such as C# .NET or Java
- 5+ years' experience implementing production performance and availability monitoring using tools like New Relic or DataDog
- Strong DevOps focus with experience in Infrastructure as Code using Terraform or similar technologies
- Experience securing Windows or Linux systems in a 24x7 production environment
Comments