Job Summary
A company is looking for a Principal Infra and Ops Engineer to manage operations related to an enterprise AI/ML platform.
Key Responsibilities:
- Implement automation across the infrastructure lifecycle using Infrastructure as Code (IaC) and DevOps principles
- Develop and implement monitoring frameworks for infrastructure, ensuring high availability and performance optimization
- Design and test disaster recovery and business continuity plans to maintain data integrity and minimize downtime
Required Qualifications:
- Bachelor's degree in computer science, information technology, or a related STEM field
- 8+ years of infrastructure experience, particularly with Microsoft Azure, AWS, or GCP
- 6+ years of experience in Infrastructure-as-Code and CI/CD tools, such as Terraform and Git Actions
- 4+ years of experience with containerization technologies like Kubernetes and Docker
- 4+ years of scripting and automation experience, with proficiency in languages like Python and Bash
Comments