Preference Model is building the next generation of training data to power the future of AI. Today's models are powerful but fail to reach their potential across diverse use cases because so many of the tasks that we want to use these models are out of distribution. Preference Model creates RL environments where models encounter research and engineering problems, iterate, and learn from realistic feedback loops.
Our founding team has previous experience on Anthropic’s data team building data infrastructure, tokenizers, and datasets behind the Claude. We are partnering with leading AI labs to push AI closer to achieving its transformative potential. We are backed by a16z.
This is an entry-level engineering role built for new or recent graduates who are eager to work on one of the most technically exciting problems in AI: building the RL environments that teach frontier models how to think and reason.
You'll join a small, high-ownership team and contribute directly to the infrastructure that powers our reinforcement learning training pipeline. You don't need years of production experience — we're looking for strong fundamentals, a genuine interest in RL, and the drive to learn fast in a startup environment. The work spans:
Building and scaling distributed training systems in PyTorch
Designing automation for monitoring, debugging, and recovery in large-scale training runs
Working directly with researchers to translate RL training experiments into reliable infrastructure
Improving the performance and reliability of GPU/TPU workloads
We're looking for new or recent grads (BS, MS, or PhD) in CS, ML, or a related field who are enthusiastic about RL and AI infrastructure.
Solid Python programming skills and comfort with systems-level thinking
Familiarity with PyTorch and a conceptual understanding of distributed training
Coursework or project experience with reinforcement learning (even toy environments count)
Exposure to any modern RL framework is a plus — verl, NeMo-RL, ART, Atropos, or similar
Basic familiarity with containers (Docker) or orchestration concepts (Kubernetes, Ray) is helpful but not required
Strong systems thinking with the ability to design for scale
Excellent debugging skills across the entire stack
Collaborative mindset with strong communication skills to work effectively with researchers and engineers
Self-directed problem solver who takes ownership and drives solutions end-to-end
Passion for staying current with the rapidly evolving ML infrastructure landscape
Research projects, coursework, or personal work involving RL environments (any framework, any scale)
Open-source contributions to ML infrastructure or RL tooling
Experience with any cloud platform (AWS, GCP, Azure) or infrastructure-as-code tools
We value diverse perspectives and experiences. If you're excited about this role but don't check every box, we still encourage you to apply.
We are backed by a Tier 1 VC. We offer competitive base salary as well as generous equity (>90th percentile).