Let’s get started
Company Logo

Remote Jobs

Senior AI HPC Cluster Engineer

8/1/2025

No location specified

Job Summary

A company is looking for a Senior AI-HPC Cluster Engineer - MLOps.

Key Responsibilities
  • Provide leadership and mentorship on managing large-scale HPC systems, including compute, networking, and storage deployment
  • Develop scalable automation solutions for GPU-accelerated computing and support researchers with performance analysis and optimizations
  • Conduct root cause analysis, proactively address issues, and build innovative tooling to enhance researchers' efficiency
Required Qualifications
  • Bachelor's degree in Computer Science, Electrical Engineering, or related field, or equivalent experience
  • Minimum of 6 years of experience with large-scale compute infrastructure
  • Experience with AI/HPC job schedulers and orchestrators, such as Slurm or Kubernetes
  • Proficient in Linux distributions and container technologies like Docker
  • Proficiency in one scripting language and at least one compiled language

Comments

No comments yet. Be the first to comment!