Job Summary
A company is looking for a Senior AI-HPC Cluster Engineer.
Key Responsibilities
- Provide leadership and strategic guidance on managing large-scale HPC systems including compute, networking, and storage deployment
- Develop and improve the ecosystem around GPU-accelerated computing and scalable automation solutions
- Build and maintain AI and ML heterogeneous clusters on-premises and in the cloud
Required Qualifications
- Bachelor's degree in Computer Science, Electrical Engineering, or related field, or equivalent experience
- Minimum 8 years of experience designing and operating large scale compute infrastructure
- Experience with AI/HPC advanced job schedulers such as Slurm, K8s, RTDA, or LSF
- Proficient in administering Centos/RHEL and/or Ubuntu Linux distributions
- Solid understanding of cluster configuration management tools such as Ansible, Puppet, or Salt
Comments