Job Summary
A company is looking for an Infrastructure Engineer (Infiniband / NCCL).
Key Responsibilities
- Design, build, and maintain automation, APIs, and frameworks for managing physical infrastructure at scale
- Develop and extend systems for server lifecycle management
- Implement and tune InfiniBand networking and NCCL configurations for multi-GPU communication
Required Qualifications
- 8+ years of professional experience in infrastructure engineering, HPC, or related domains
- Strong experience with Linux in production environments
- Proficiency in Python or similar languages for automation
- Deep understanding of InfiniBand networking and related technologies
- Familiarity with NCCL, CUDA, and GPU topology optimization
Comments