Job Summary
A company is looking for a Senior Site Reliability Engineer - Observability and Telemetry Platform.
Key Responsibilities
- Design, implement, and support operational and reliability aspects of a large-scale Observability & Telemetry collection platform
- Engage in and improve the lifecycle of services from inception and design through deployment and operation
- Maintain services by measuring and monitoring availability, latency, and overall system health
Required Qualifications
- BS degree in Computer Science or a related technical field, or equivalent experience
- 5+ years of experience with infrastructure automation and distributed systems design
- 8+ years of experience delivering foundational infrastructure and observability platforms
- Experience in one or more programming languages such as Python, Go, Perl, or Ruby
- In-depth knowledge of Linux, Networking, and Containers
Comments