Remote Jobs

Senior Site Reliability Engineer

10/2/2025

No location specified

Job Summary

A company is looking for a Senior Cluster Site Reliability Engineer.

Key Responsibilities

Respond to and resolve urgent cluster outages or issues
Ensure high cluster uptime and track SLAs for reliability
Diagnose recurring problems and collaborate on engineering solutions

Required Qualifications

5+ years of experience in SRE or DevOps roles
Knowledge of HPC/batch compute frameworks and machine learning training systems
Ability to develop scripts in a common scripting language
Familiarity with infrastructure-as-code and cloud infrastructure
Bachelor's degree in computer science or equivalent experience

Comments

No comments yet. Be the first to comment!

Similar Jobs

AI Enablement Engineer

9/27/2025

Remote Jobs

Senior 3Cs Engineer

9/25/2025

Remote Jobs

Manufacturing Solutions Engineer

9/26/2025

Remote Jobs

Database Engineer (with C# Focus) – Remote

9/26/2025

Remote Jobs

Principal Forward Deployed Engineer

9/24/2025

Remote Jobs

Developer Relations Engineer

10/1/2025

Remote Jobs

Lead Estimator – Construction – (Full-time Remote or Hybrid)

9/22/2025

Remote Jobs

Senior Site Reliability Engineer

10/3/2025

Remote Jobs

Enterprise Architect

9/19/2025

Remote Jobs

Enterprise Platform Engineer

9/26/2025

Remote Jobs

Junior Network Engineer

9/30/2025

Remote Jobs

Applied AI Engineer

10/1/2025

Remote Jobs

Senior Systems Engineer, Slack

9/30/2025

Remote Jobs