Let’s get started
Company Logo

Remote Jobs

Senior Site Reliability Engineer

8/30/2025

No location specified

Job Summary

A company is looking for a Senior Site Reliability Engineer, DGX Cloud.

Key Responsibilities
  • Support large-scale Kubernetes services and manage system creation, capacity, and launch reviews
  • Build and maintain operational reliability for large-scale Kubernetes clusters with a focus on performance and monitoring
  • Lead incident response and root-cause analysis while maintaining service health and optimizing GPU workloads across cloud platforms
Required Qualifications
  • BS in Computer Science or related technical field, or equivalent experience
  • 12+ years of experience operating production services at scale
  • Expert-level knowledge of Kubernetes administration and microservices architecture
  • Experience with infrastructure automation tools and proficiency in at least one high-level programming language
  • In-depth knowledge of Linux, networking fundamentals, and SRE principles

Comments

No comments yet. Be the first to comment!

Similar Jobs

Remote Jobs

Remote Jobs

Remote Jobs

Remote Jobs

Remote Jobs

Remote Jobs

Remote Jobs

Remote Jobs

Remote Jobs

Remote Jobs

Remote Jobs

Remote Jobs

Remote Jobs

Remote Jobs

Remote Jobs

Remote Jobs