Job Summary
A company is looking for a talented SRE (Site Reliability Engineer) with expertise in cloud infrastructure and automation.
Key Responsibilities
- Define and monitor system KPIs, including SLOs/SLAs, and build dashboards for monitoring and alerting
- Conduct incident response to identify root causes and fix system issues, participating in on-call rotations
- Optimize system performance, ensure scalability, and manage capacity planning and load testing
Required Qualifications
- Strong background in cloud infrastructure
- Experience with automation and monitoring tools
- Ability to build and maintain CI/CD pipelines
- Familiarity with self-healing systems
- Preference for US Citizens or Green Card holders
Comments