Job Summary
A company is looking for a Site Reliability Engineer to join a high-impact cloud infrastructure team.
Key Responsibilities
- Drive the creation and evolution of observability systems including dashboards, logging, alerting, and instrumentation
- Identify trends, anomalies, and early warning signs through data analysis
- Collaborate across teams using agile ceremonies and direct feedback loops
Required Qualifications
- Deep knowledge of observability tooling, preferably with Datadog
- Hands-on SRE experience within AWS, including Lambda, containers, and IAM
- Strong programming skills in Python and Ruby
- Experience with Terraform and infrastructure as code (IaC) practices
- Familiarity with incident management, on-call rotations, and SLAs
Comments