Job Summary
A company is looking for an HPC Infrastructure Engineer.
Key Responsibilities
- Design, develop, and maintain Ansible playbooks for provisioning and configuring compute nodes, storage systems, and network services
- Automate OS and middleware upgrades, security patching, and routine maintenance tasks across HPC clusters
- Collaborate with application owners to optimize cluster performance for CAE, data analytics, and AI/ML workloads
Required Qualifications
- Experience with Ansible Automation Platform (AAP)
- Knowledge of HPC infrastructure and cluster management
- Familiarity with monitoring tools such as Prometheus and Grafana
- Experience in troubleshooting complex hardware and software issues
- Ability to document system designs and mentor junior engineers
Comments