We are looking for a Machine Learning Engineer who sits at the intersection of robust system design, full-stack engineering, and MLOps. In this role, you won't just build features; you will architect highly reliable, deterministic software systems capable of running complex AI simulations at scale.
The ideal candidate brings strong backend expertise in Python and FastAPI, and a deep operational mindset focused on MLOps engineering, infrastructure reliability, and rigorous automated testing.
Architect Reliable Backends: Design, implement, and maintain high-performance backend systems using Python and FastAPI, prioritizing deterministic execution, high availability, and fault tolerance.
Build Scale-Ready MLOps Pipelines: Own and optimize infrastructure for model evaluation and synthetic data pipelines. Implement and scale continuous integration and continuous deployment (CI/CD) and continuous monitoring (CM) for ML workflows using AWS, Docker, Kubernetes, and Jenkins/GitHub Actions.
Enforce Rigorous Testing Standards: Champion a culture of reliability by designing and maintaining comprehensive automated testing suites (unit, integration, end-to-end, and regression testing). Build frameworks to validate both standard software logic and non-deterministic ML model outputs.
Develop Responsive Frontends: Build scalable, highly responsive web interfaces using React/Next.js that allow users to seamlessly monitor simulations, visualize failure modes, and inspect synthetic data generation.
Optimize Data Infrastructure: Manage and optimize both SQL and NoSQL databases, ensuring efficient data ingestion, high-throughput query performance, and reliable storage for massive simulation datasets.
Collaborate on AI/ML Orchestration: Partner closely with research teams to deploy, monitor, and scale applications leveraging state-of-the-art NLP and LLM technologies.
There are a few specific things we’ll be looking for that will help you succeed in this role:
Education: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
MLOps & DevOps Core: Hands-on experience building and maintaining MLOps pipelines or cloud infrastructure using AWS, Docker, and Kubernetes. Deep familiarity with CI/CD automation is required.
Testing-First Mindset: Strong expertise in modern testing frameworks and a disciplined approach to Test-Driven Development (TDD). Experience testing data pipelines, microservices, or machine learning systems is highly valued.
Data Systems: Solid understanding of database design, schema optimization, and query tuning for both SQL and NoSQL databases under heavy data loads.
AI/LLM Exposure: Experience working on or supporting projects involving LLMs, NLP, or complex agentic workflows.
Startup Execution: Strong problem-solving skills, architectural intuition, and the ability to build highly predictable systems in a fast-paced, collaborative startup environment.
Bonus: Active contributions to open-source software or an active GitHub portfolio showcasing reliable infrastructure engineering.