Boost.ai group logo

AI Engineer - Voice

Boost.ai group
1 day ago
Full-time
Remote friendly (Norway)
Worldwide
ML & AI Engineering
Are you ready to be at the bleeding edge of Voice AI, building systems that handle millions of real conversations per month where every millisecond counts?

At boost.ai, we are currently scaling up our Voice team. 2026 is the year of Voice!

With a steady 15-20% YoY growth rate at boost.ai, we are the go-to Conversational AI provider for regulated sectors such as finance, the public sector, insurance, and telecom, and we are expanding into many more
industries, including healthcare.

We are not just another "wrapper" that popped up overnight.
We are a bona fide AI and Data Science company that has been pioneering machine learning since 2016.

Today, we are building a voice-first platform where millions of conversations will run through, and more than 600 of our virtual agents are live across EMEA and North America. Enterprises like Tryg, DNB, Telenor, and Nordea trust us with their customers.

By joining us, you are stepping into the core of a central, rapidly growing player in the global AI space.

You won't be building theoretical demos or waiting on roadmap slides.
You would be joining early enough to genuinely shape it, and established enough that real customers are already depending on what you ship.

Our team is diverse, dedicated, and a little bit obsessed with getting voice right, and we work hard to make sure everyone here can do the best work of their career.

You will be on the bleeding edge, solving some of the hardest engineering challenges in low-latency computing and genuinely shaping a product that ships directly to massive global enterprises.

The Challenge: The Nordic Gap & Zero Forgiveness
Voice is our biggest bet and our fastest-growing product, but running speech
models in a Jupyter notebook is entirely different from serving them live to
millions of enterprise customers.

Your mission is to tackle the exact ML problems that the major tech labs quietly
ignore.

You will take ownership of perfecting STT and TTS across the complex Nordic languages, curating datasets, fine-tuning open and proprietary models, and defining the gold standard for accuracy in languages with limited open-source coverage.

Simultaneously, you will be taking next-generation Speech-to-Speech (S2S) architectures straight from research papers into production.
When interacting with an AI voice, users have zero forgiveness for unnatural pauses. You will be forced to constantly balance model accuracy against aggressive millisecond-latency budgets and GPU economics.
It is a fast-moving, highly complex space where the research questions and the live production system are the same job.

Where you can work:
This role is hybrid and based in one of our European offices: Copenhagen, Oslo, Stavanger, Stockholm, Helsinki, or London.

Note: Eligibility to live and work in the Nordics or the UK is required. We are not
able to support relocation from outside this region or fully remote
arrangements for this position.
Note: We are open about seniority. Whether you are an experienced Applied ML Engineer or a Senior/Staff-level practitioner with a real track record in speech, we want to talk.

Requirements (The Non-Negotiables):
● Hands-on experience with speech models: Fine-tuning STT (Whisper, Parakeet, NeMo, or similar) and/or TTS (XTTS, VITS, StyleTTS, or commercial APIs).

● You have taken models all the way to production.

● Experience with PyTorch and modern inference stacks (Hugging Face, ONNX, Triton, or equivalent inference servers).

● Strong Python skills, comfortable across both the research workflow and
writing production-grade code.

● A degree in Computer Science, Machine Learning, Computational Linguistics, Signal Processing, or a closely related field.

● Fluent in English and one or more Nordic languages.

Blow us away (Nice-to-haves):

● You have designed both objective and subjective evaluation methods for speech quality, you know that what you can measure, you can improve.

● Comfortable deploying and operating on AWS, Azure, or GCP, and familiar with GPU inference, model serving, and latency-critical workloads.

● Familiarity with Speech-to-Speech (S2S) architectures is a strong plus. We are looking for a Voice AI Engineer to lead the next generation of our voice platform. This role sits right where applied ML meets platform engineering. If you want the satisfaction of watching your models come to life rather than sit in a Jupyter notebook, you will feel right at home here.

You will

Own Nordic language coverage: Drive STT and TTS quality across the Nordic languages, including the underserved ones the major labs quietly deprioritize. You will build dataset curation pipelines, fine-tune open and proprietary speech models, and own the bar for what "good" sounds like.

Bring speech-to-speech to production. Take next-generation speech-to-speech architectures from prototype to production: latency-aware, cost-aware, and evaluation-driven from day one.

Build the evaluation harness. Design the objective metrics, listening tests, and regression harnesses that let us swap, tune, and upgrade speech models with confidence. We genuinely believe the eval harness matters just as much as the model checkpoint.

Shape the inference architecture. Partner with platform engineering on model
routing, multi-region deployment, latency optimization, and the CPU/GPU
economics of real-time speech.
Stay at the frontier. Keep up with the fast-moving world of speech and
multimodal AI (Whisper, Parakeet, XTTS, and S2S models coming out of
OpenAI, Meta, Google, ElevenLabs, and open-source) and bring the advances
that matter onto our roadmap.
Work across teams. Collaborate with our Voice, ML, Core Platform and
customer-facing teams to turn language requirements and customer needs into
model work that actually ships. We work hard on tough challenges, but fun is one of our actual core values.

We are an opinionated, high-trust team that celebrates wins together (our annual Boost Camp every October is a massive highlight!).
You will have the autonomy and trust to make a real difference to the direction of our voice technology.
To back you up, we offer a comprehensive range of employee benefits designed to support your overall well-being and allow you to do the absolute best work of your career.