Job Summary
A company is looking for a Principal Infra - SRE.
Key Responsibilities
- Support predictive AI workloads in production, including troubleshooting across various layers
- Define and implement observability strategies for AI systems using monitoring tools
- Provide on-call support for GenAI and predictive pipelines, ensuring system health and performance
Required Qualifications
- Hands-on experience with vector databases like Elasticsearch and understanding of indexing strategies
- Strong command of Linux systems, including shell scripting and system-level monitoring
- Proficiency in Python programming for automation scripting and building AI models
- Familiarity with big data technologies, particularly Hadoop-based platforms
- Experience with CI/CD pipelines and containerization technologies like Docker and Kubernetes
Comments