Remote Jobs

Principal Site Reliability Engineer

7/31/2025

Remote

Job Summary

A company is looking for a Principal Site Reliability Engineer, AI Infrastructure.

Key Responsibilities

Architect and scale globally distributed production systems for AI/ML and HPC across hybrid and multi-cloud environments
Design and implement automation frameworks to enhance system resilience and operational efficiency
Lead initiatives to assess operational maturity and establish long-term reliability strategies in collaboration with various teams

Required Qualifications

15+ years of experience in SRE, Production Engineering, or Cloud Infrastructure
Deep expertise in Linux/Unix systems and public/private cloud platforms (AWS, GCP, Azure, OCI)
Expert-level programming skills in Python and familiarity with languages such as C++, Go, or Rust
Experience with Kubernetes, microservice orchestration, and observability frameworks
Degree in Computer Science or related field, or equivalent experience

Comments

No comments yet. Be the first to comment!

Similar Jobs

Senior Electrical Engineer

7/25/2025

Remote Jobs

Senior Principal React Native Engineer

7/24/2025

Remote Jobs

Full Stack Developer

8/1/2025

Remote Jobs

FullStack Engineer II

7/19/2025

Remote Jobs

Senior Engineering Manager

7/23/2025

Remote Jobs

Solutions Architect (Insurance)

7/30/2025

Remote Jobs

Senior Manufacturing Support Technician

7/27/2025

Remote Jobs

Project Architect - IT/OT Integration

7/29/2025

Remote Jobs

Vice President of Engineering

7/31/2025

Remote Jobs

HubSpot Developer

7/30/2025

Remote Jobs

Senior Site Reliability Engineer

8/1/2025

Remote Jobs

Engineering Manager for Web Standards

7/30/2025

Remote Jobs

Cyber Recovery SRE (Networking)

7/29/2025

Remote Jobs