Job Summary
A company is looking for a Datacenter Observability and Site Reliability Engineer.
Key Responsibilities
- Design, implement, and maintain observability solutions for datacenter infrastructure
- Implement SRE best practices and develop automation scripts for infrastructure management
- Provide support for observability and reliability-related issues, including troubleshooting and documentation
Required Qualifications
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field
- 8+ years of experience in datacenter observability and site reliability engineering
- Proficiency in observability tools and technologies (e.g., Prometheus, Grafana, ELK Stack)
- Experience with SRE practices and tools (e.g., Kubernetes, Docker, Terraform)
- Strong programming and scripting skills (e.g., Python, Go, Bash)
Comments