Let’s get started
Company Logo

Remote Jobs

GPU Infrastructure Engineer

10/3/2025

Remote

Job Summary

A company is looking for a GPU and HPC Infrastructure Engineer - New College Grad 2025.

Key Responsibilities
  • Contribute to the automation of datacenter operations and lifecycle management for large-scale Machine Learning systems
  • Implement monitoring and health management capabilities for GPU assets to ensure reliability and scalability
  • Develop software for NVLINK topography management and build automated test infrastructure for distributed systems
Required Qualifications
  • Pursuing or recently completed a BS or MS in Computer Science, Engineering, Physics, Mathematics, or a comparable degree
  • Software engineering experience on large-scale production systems
  • Strong knowledge of a systems programming language (Go, Python) and understanding of Data Structures and Algorithms
  • High-level knowledge of Linux system administration and cluster management systems (Kubernetes, SLURM)
  • Understanding of performance, security, and reliability in complex distributed systems

Comments

No comments yet. Be the first to comment!