Job Summary
A company is looking for a Member of Technical Staff, Pre-Training Data Engineer.
Key Responsibilities
- Design and build scalable data pipelines for diverse datasets, ensuring effective ingestion, cleaning, filtering, and optimization
- Conduct data ablations to assess quality and experiment with data mixtures to enhance model performance
- Develop robust data modeling techniques to structure datasets for optimal training efficiency
Required Qualifications
- Strong software engineering skills with proficiency in Python
- Familiarity with data processing frameworks such as Apache Spark, Apache Beam, or Pandas
- Experience working with large-scale datasets, including web and multilingual data
- Knowledge of data quality assessment techniques
- A passion for bridging research and engineering in AI model training
Comments