Job Summary
A company is looking for a Machine Learning Ops Engineer (LATAM).
Key Responsibilities
- Manage infrastructure and propose stacks to meet business objectives
- Troubleshoot and optimize AI infrastructure issues and model training speed
- Evaluate and implement new AI training and development platforms
Required Qualifications
- 5+ years of experience with ML Ops tools (e.g., SLURM, MLflow, Kubeflow)
- Experience managing Kubernetes clusters and distributed training workloads
- Proficiency in containerization tools (Docker, Singularity)
- Familiarity with deep learning frameworks (PyTorch, TensorFlow)
- Strong scripting skills in Python, Bash, or similar languages
Comments