Job Summary
A company is looking for an AI and ML Storage Infrastructure Software Engineer, GPU Clusters.
Key Responsibilities
- Collaborate with AI and ML research teams to identify and resolve storage infrastructure needs
- Monitor and optimize infrastructure performance for high availability and efficient resource utilization
- Define and improve measures of AI researcher efficiency related to storage
Required Qualifications
- BS or equivalent experience in Computer Science or related field, with 6+ years of experience in AI/ML and HPC workloads
- Hands-on experience with HPC infrastructure and knowledge of accelerated computing and storage technologies
- Expertise in running large-scale distributed training workloads using frameworks such as PyTorch or JAX
- Proficiency in programming languages like Python, Go, and Bash, and familiarity with cloud computing platforms
- Passion for continuous learning in AI/ML infrastructure technologies
Comments