Preference Model logo

New Grad Machine Learning Engineer

Preference Model
2 days ago
Full-time
On-site
San Francisco, California, United States
ML & AI Engineering

About Us

Preference Model is building the next generation of training data to power the future of AI. Today's models are powerful but fail to reach their potential across diverse use cases because so many of the tasks that we want to use these models are out of distribution. Preference Model creates RL environments where models encounter research and engineering problems, iterate, and learn from realistic feedback loops.

Our founding team has previous experience on Anthropic’s data team building data infrastructure, tokenizers, and datasets behind the Claude. We are partnering with leading AI labs to push AI closer to achieving its transformative potential. We are backed by a16z.

About the Role

This is an entry-level engineering role built for new or recent graduates who are eager to work on one of the most technically exciting problems in AI: building the RL environments that teach frontier models how to think and reason.

You'll join a small, high-ownership team and contribute directly to the infrastructure that powers our reinforcement learning training pipeline. You don't need years of production experience — we're looking for strong fundamentals, a genuine interest in RL, and the drive to learn fast in a startup environment. The work spans:

  • Building and scaling distributed training systems in PyTorch

  • Designing automation for monitoring, debugging, and recovery in large-scale training runs

  • Working directly with researchers to translate RL training experiments into reliable infrastructure

  • Improving the performance and reliability of GPU/TPU workloads

About You

We're looking for new or recent grads (BS, MS, or PhD) in CS, ML, or a related field who are enthusiastic about RL and AI infrastructure.

Technical Skills Required:

  • Solid Python programming skills and comfort with systems-level thinking

  • Familiarity with PyTorch and a conceptual understanding of distributed training

  • Coursework or project experience with reinforcement learning (even toy environments count)

  • Exposure to any modern RL framework is a plus — verl, NeMo-RL, ART, Atropos, or similar

  • Basic familiarity with containers (Docker) or orchestration concepts (Kubernetes, Ray) is helpful but not required

What Makes You Successful:

  • Strong systems thinking with the ability to design for scale

  • Excellent debugging skills across the entire stack

  • Collaborative mindset with strong communication skills to work effectively with researchers and engineers

  • Self-directed problem solver who takes ownership and drives solutions end-to-end

  • Passion for staying current with the rapidly evolving ML infrastructure landscape

Nice to Have

  • Research projects, coursework, or personal work involving RL environments (any framework, any scale)

  • Open-source contributions to ML infrastructure or RL tooling

  • Experience with any cloud platform (AWS, GCP, Azure) or infrastructure-as-code tools


We value diverse perspectives and experiences. If you're excited about this role but don't check every box, we still encourage you to apply.

We are backed by a Tier 1 VC. We offer competitive base salary as well as generous equity (>90th percentile).