Let’s get started
Company Logo

Remote Jobs

Senior Site Reliability Engineer

10/2/2025

No location specified

Job Summary

A company is looking for a Senior Cluster Site Reliability Engineer.

Key Responsibilities
  • Respond to and resolve urgent cluster outages or issues
  • Ensure high cluster uptime and track SLAs for reliability
  • Diagnose recurring problems and collaborate on engineering solutions
Required Qualifications
  • 5+ years of experience in SRE or DevOps roles
  • Knowledge of HPC/batch compute frameworks and machine learning training systems
  • Ability to develop scripts in a common scripting language
  • Familiarity with infrastructure-as-code and cloud infrastructure
  • Bachelor's degree in computer science or equivalent experience

Comments

No comments yet. Be the first to comment!