Let’s get started
Company Logo

Remote Jobs

Senior AI Observability Engineer

7/22/2025

Remote

Job Summary

A company is looking for a Senior AI Observability Engineer to architect and implement distributed observability systems for AI and HPC clusters.

Key Responsibilities
  • Collaborate with engineering and research teams to deliver observability solutions for AI/HPC clusters
  • Develop, test, and deploy data collectors, pipelines, and visualization services
  • Define data collection and retention policies to optimize network bandwidth and storage costs
Required Qualifications
  • Experience developing large scale, distributed observability systems
  • Proficiency in Python programming and API usage
  • Experience with observability platforms like Apache Spark, Elastic/Open Search, and Grafana
  • MS (preferred) or BS in Computer Science, Electrical Engineering, or related field
  • 8+ years of proven experience in relevant fields

Comments

No comments yet. Be the first to comment!