Senior AI Engineer

Citizen Health

1 day ago

Full-time

Remote friendly (San Francisco, California, United States)

Worldwide

ML & AI Engineering

The Problem

We're building an agentic AI caregiver advocate that lives in your text messages and acts on behalf of rare disease patients and their families. It doesn't just answer questions. It makes phone calls to insurance companies, automates browser sessions to find clinical trials, prepares patients for specialist appointments using their complete medical history, navigates government benefits applications, and coordinates across multiple sub-agents to get things done. It maintains persistent memory across months of interactions and gets smarter about each patient's condition, care team, and needs over time.

The AI engineering challenge: build an autonomous agent that operates reliably in one of the highest-stakes domains imaginable — where a wrong answer about a medication interaction or a missed insurance deadline has real consequences for a sick child and their family. Traditional chatbot architecture doesn't cut it. We need engineers who can design agentic systems that reason, plan, use tools, and know when to escalate to a human.

The Role

We're hiring AI Engineers to build the intelligence behind our AI advocate. You'll design and implement the agentic LLM systems, RAG pipelines, tool-use frameworks, and evaluation infrastructure that make it effective, reliable, and safe. This is not a research role — you'll ship production systems that real patients depend on daily.

You'll work closely with Product, Platform Engineering, and clinical advisors. You'll report to the CTO.

What You'll Build

Agentic orchestration — Design and implement the multi-agent architecture that powers our advocate's ability to reason, plan, and execute complex multi-step tasks (scheduling appointments, filing insurance appeals, researching treatment options) with appropriate guardrails and human-in-the-loop checkpoints.

RAG and medical knowledge systems — Build retrieval pipelines over patient medical records, condition-specific knowledge bases, and community-sourced data from 125+ rare disease communities. Ensure the advocate's responses are grounded in the patient's actual medical history, not generic information.

Tool-use and integrations — Build and maintain the frameworks that let our agent take real-world actions: browser automation, phone calls, form filling, document generation, and API integrations with healthcare systems. Each tool-use needs to be reliable, auditable, and scoped to the right permissions.

Conversation intelligence — Design the systems that make the advocate's WhatsApp conversations feel natural, contextual, and proactive. This includes long-term memory, conversation state management, personalization, and knowing when to surface information the patient didn't ask for but needs.

Evaluation and safety — Build evaluation frameworks for agentic outputs in a domain where "good enough" isn't good enough. Develop testing infrastructure for multi-step agent workflows, hallucination detection, medical accuracy validation, and regression testing across model updates.

Prompt engineering and model optimization — Design prompt architectures that balance capability, cost, and latency. Evaluate and integrate new models as the landscape evolves. Optimize inference costs — our unit economics depend on it.

Who You Are

You've built LLM-powered systems that went to production and had real users — not just demos and prototypes. You understand that the hard part of agentic AI isn't getting a cool demo working, it's making it reliable at the 95th percentile. You've dealt with the unglamorous parts: evaluation frameworks, prompt regression testing, cost optimization, and edge cases that only show up at scale.

You care about the craft of building AI systems, not just the hype cycle. You've formed opinions about when to use agents vs. chains vs. simple prompts, and you can articulate the tradeoffs. You're comfortable working in a startup where the stack is evolving fast and the best architecture for next quarter might not be the one you built last quarter.

Must-Have Skills

4+ years of software engineering experience, with 2+ years building LLM-powered applications in production.
Strong proficiency in Python and hands-on experience with agentic AI frameworks (LangChain, LangGraph, CrewAI, AutoGen, or equivalent).
Experience designing and implementing RAG systems — including chunking strategies, embedding models, vector databases, retrieval evaluation, and hybrid search.
Track record building AI systems that take real-world actions (tool-use, function calling, browser automation, API orchestration), not just generate text.
Strong software engineering fundamentals — you write production-grade code with tests, logging, error handling, and clear abstractions. You're an engineer who uses AI, not a prompt writer who avoids code.
Experience with evaluation and testing of LLM systems — you've built evals, measured hallucination rates, and designed regression suites for non-deterministic outputs.
Comfort with ambiguity and fast iteration. You've worked at a startup or early-stage company and thrived.
Strong written and verbal communication — you can explain agentic architecture tradeoffs to a PM and debug a prompt chain with an engineer in the same afternoon.

Preferred Skills

Experience with healthcare, life sciences, or other regulated domains where AI output accuracy has high stakes (HIPAA compliance, clinical data handling).
Familiarity with multi-agent architectures — designing systems where multiple specialized agents collaborate, hand off tasks, and share context.
Experience with WhatsApp Business API, messaging platforms, or conversational AI at scale.
Background in inference cost optimization — model selection, caching, batching, routing between models of different capability/cost profiles.
Familiarity with medical data formats (FHIR, HL7, CDA) or experience building systems over medical records.
Experience with voice AI, telephony APIs, or browser automation frameworks (Playwright, Puppeteer).
Contributions to open-source AI/LLM tooling or active engagement with the agentic AI community.

Who We Are

Citizen Health was founded on the belief — shaped by firsthand lived experiences navigating healthcare — that having the right advocate is the single most important factor in achieving better care and outcomes. By uniquely combining data, AI, and community, Citizen is building a personalized AI advocate powered by patients' complete medical histories and data from thousands of other patients to generate personalized insights for clinical decisions and day-to-day challenges. Starting in rare and complex conditions, patients share their data in exchange for value, enabling biopharma and researchers with seamless access to patients and regulatory-grade data, shaving years off drug development for much-needed treatments.

Citizen is founded by experienced entrepreneurs with multiple successes under their belts and is funded by top-tier investors including 8VC, Transformation Capital, and Headline Ventures, among others. We are a mission-driven team excited to be building the future of consumer healthcare.

Why Join Us

Build an agentic AI system in production that actually helps people — not another chatbot, not a copilot for knowledge workers, but an autonomous advocate for families navigating rare diseases.
Work on problems at the frontier of agentic AI: multi-step planning in high-stakes domains, tool-use over messy real-world systems, long-term memory and personalization, evaluation of non-deterministic autonomous agents.
Ship to real users immediately. Our AI advocate is live and serving patients today. Your code will be in production within your first sprint.
A fast-paced, high-growth setting with expanding opportunities and clear paths for career progression.
A supportive culture of learning, experimentation, and innovation, where new ideas are encouraged and explored.

Compensation & Benefits

Competitive salary + equity
Comprehensive health, dental, and vision insurance
Unlimited PTO, including 12 weeks paid parental leave
Flexible remote-first work environment
Work-from-anywhere 30 days per year
Company offsites

Our Commitment to Diversity & Inclusion: Citizen Health is proud to be an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We welcome applicants of all backgrounds, identities, and experiences. Everyone deserves an equal chance to contribute and grow here — regardless of race, gender identity, sexual orientation, religion, national origin, age, disability, or veteran status.

Don't meet every qualification? No worries! We believe that passion, curiosity, and the right mindset are just as important as a checklist of skills. If you’re excited about what we’re building, we encourage you to apply.

Apply now

Senior AI Engineer

The Problem

The Role

What You'll Build

Who You Are

Must-Have Skills

Preferred Skills

Who We Are

Why Join Us

Compensation & Benefits

More jobs

Senior Machine Learning Engineer, Model Training & Evaluation

ABBYY

Senior Machine Learning Engineer

AECOM