What you will own
You will be responsible for one thing:
Make our AI outputs reliable, fast, and indispensable in real workflows.
Concretely:
Design and evolve our LLM / agent architecture
Own output quality across key use cases (emails, document analysis, etc.)
Build evaluation systems (datasets, metrics, regression detection)
Drive fast iteration loops from production data
Improve retrieval, reasoning, and tool usage
Ensure production reliability (latency, failure modes, fallback)
Work directly with product + founders on what to build and why
What this role is really about
Most teams fail because:
they don’t know what “good output” means
they don’t have evals
they iterate randomly
they overuse agents
Your job is to fix that.
You will turn:
vague user problems
→ into structured AI systems
→ with measurable performance
→ that improve every week