Let’s get started
Company Logo

Remote Jobs

Site Reliability Engineer

7/11/2025

No location specified

Job Summary

A company is looking for a Site Reliability Engineer.

Key Responsibilities
  • Deploy clusters of 1,000+ GPUs and modify tools for customer solutions
  • Validate and optimize compute, storage, and networking infrastructure
  • Debug production issues and build internal tooling to enhance deployment efficiency
Required Qualifications
  • 2+ years of experience in SRE, DevOps, Sysadmin, or HPC engineering
  • Experience deploying and operating Kubernetes and/or SLURM clusters
  • Proficiency in Go, Python, and Bash programming languages
  • Familiarity with automation tools like Ansible and Terraform
  • Strong engineering background in Computer Science, Software Engineering, Math, or related fields

Comments

No comments yet. Be the first to comment!