AI Infrastructure Engineer｜AIインフラ基盤

Earthbrain

2 months ago

Full-time

On-site

港区, 東京都, Japan

MLOps & AI Infrastructure

"Building the foundation that delivers AI to the field." | "AIを、現場に届ける基盤をつくる。"

(Japanese version follows English)

About EARTHBRAIN

EARTHBRAIN develops and provides the Smart Construction® series, digitizing key construction site operations and transforming infrastructure development worldwide.

As of 2025, the solution has been deployed to approximately 10,000 sites domestically and internationally, establishing a top-class track record in construction ICT solutions.

Background

EARTHBRAIN is advancing R&D across multiple AI domains including LLMs, 3D vision, point cloud recognition, and Agentic AI, with a steadily growing portfolio of AI-powered products.

As the business scales, strengthening infrastructure, deployment pipelines, and operational design for agentic systems has become a key priority for delivering AI technologies to Smart Construction® more rapidly and reliably.

This role designs and builds the infrastructure to deliver models and engines produced by the AI team to Smart Construction® at production quality, establishing AI infrastructure best practices for EARTHBRAIN.

Responsibilities

Design, build, and optimize ML model serving infrastructure on GPU environments
Introduce and design operations for container orchestration (ECS / EKS / GKE / Cloud Run, etc.)
Design and build CI/CD pipelines (model versioning, testing, deployment automation)
Build and optimize agentic orchestration systems
Build model monitoring and observability infrastructure (accuracy degradation detection, data drift monitoring)
Design and execute multi-cloud strategies including AWS optimization and GCP adoption
Establish deployment standards and best practices for AI product integration
Collaborate with AI engineers and researchers, serving as a technical bridge to product development teams

Required Skills & Experience

Experience designing and operating container orchestration (ECS / EKS / GKE / Cloud Run)
3+ years of AWS infrastructure experience (5+ years preferred)
Experience deploying ML models or building inference infrastructure on GPU environments
Experience designing and building CI/CD pipelines
Deep understanding of Docker and container technologies
Technical communication in English (CEFR C1 or above)
Japanese communication ability (JLPT N3–N2 equivalent or above)

Note: CVs and interviews for this position will be conducted in English.

Preferred Skills & Experience

GCP experience (multi-cloud environment design)
Hands-on experience with MLOps tools (MLflow, Kubeflow, SageMaker, Vertex AI, etc.)
Infrastructure management with IaC (Terraform / Pulumi, etc.)
Experience building model monitoring / data drift detection systems
Experience designing and implementing inference APIs (FastAPI / gRPC, etc.)
Cost optimization experience (reserved instances, spot instances, GPU sharing, etc.)
Experience building and operating agentic AI frameworks (LangGraph, CrewAI, etc.)

Ideal Candidate

Motivated by delivering AI to production; passionate about bridging research and implementation
Capable of designing and proposing best practices in greenfield environments
Able to communicate effectively with both AI engineers and product engineers
Positive and iterative in building infrastructure under uncertainty

Why This Role

Greenfield architecture

Design AI infrastructure from scratch without legacy constraints. You define the architecture and set the standards.

Frontline of Construction DX

Deliver AI to products used across 10,000 sites worldwide. Your infrastructure powers real-world impact at global scale.

Career growth

Expand from infrastructure into AI governance and strategy — just as SRE evolved into security and platform engineering.

Multinational team

English is the primary language within the AI team. A truly global engineering environment where your technical skills are amplified.

Tech Stack

Container / Orchestration

Docker, ECS, EKS, GKE, Cloud Run

CI/CD

GitHub Actions, ArgoCD, model versioning pipelines

Cloud

AWS (primary), GCP (multi-cloud)

ML Serving

GPU inference, TorchServe, Triton Inference Server

MLOps

MLflow, Kubeflow, SageMaker, Vertex AI

Monitoring

Prometheus, Grafana, data drift detection

IaC

Terraform, Pulumi

Inference API

FastAPI, gRPC

Agentic AI

LangGraph, CrewAI, orchestration frameworks

EARTHBRAINについて

株式会社EARTHBRAINは、建設現場の主要な作業をデジタル化する「Smart Construction®シリーズ」を開発・提供し、世界のインフラづくりを変革しています。

2025年時点で、国内外約1万件の現場に導入され、建設ICTソリューションとして国内トップクラスの実績を誇ります。

募集背景

EARTHBRAINでは、LLM・3Dビジョン・点群認識・Agentic AIなど、複数のAI技術領域で研究開発を進めており、AIプロダクトの実績も着実に積み上がっています。

事業成長に伴い、これらのAI技術をSmart Construction®により迅速かつ安定的に届けるための次のステージとして、インフラ基盤・デプロイパイプライン・エージェンティックシステムの運用設計の強化が重要テーマとなっています。

本ポジションは、AIチームが生み出すモデルやエンジンをプロダクション品質でSmart Construction®に届けるための基盤を設計・構築し、EARTHBRAINとしてのAIインフラのベストプラクティスを確立していく役割です。

業務内容

GPU環境でのMLモデルサービング基盤の設計・構築・最適化
コンテナオーケストレーション（ECS / EKS / GKE / Cloud Run等）の導入と運用設計
CI/CDパイプラインの設計・構築（モデルのバージョン管理、テスト、デプロイ自動化）
エージェンティック・オーケストレーションの構築・最適化
モデルモニタリング・オブザーバビリティ基盤の構築（精度劣化検知、データドリフト監視）
AWS環境の最適化およびGCP活用を含むマルチクラウド戦略の設計・実行
AIプロダクト適用に関するデプロイ規約・ベストプラクティスの策定
チーム内のAIエンジニア・リサーチャーと連携し、プロダクト開発チームとの技術的な橋渡しを担う

必須スキル・経験

コンテナオーケストレーション（ECS / EKS / GKE / Cloud Runいずれか）の設計・運用経験
AWSでのインフラ構築・運用経験（3年以上、5年以上歓迎）
GPU環境でのMLモデルデプロイまたは推論基盤の構築経験
CI/CDパイプラインの設計・構築経験
Docker / コンテナ技術に関する深い理解
英語でのテクニカルコミュニケーション能力（CEFR C1以上）
日本語でのコミュニケーション能力（N3～N2相当以上）

※ 本ポジションの書類選考（CV）および面談・面接は英語で実施いたします。

歓迎スキル・経験

GCPでの構築・運用経験（マルチクラウド環境の設計経験）
MLOpsツール群（MLflow、Kubeflow、SageMaker、Vertex AI等）の実務経験
IaC（Terraform / Pulumi等）によるインフラ管理経験
モデルモニタリング / データドリフト検知基盤の構築経験
FastAPI / gRPCなど推論APIの設計・実装経験
コスト最適化（リザーブドインスタンス、スポットインスタンス、GPU共有等）の経験
エージェンティックAIフレームワーク（LangGraph、CrewAI等）の構築・運用経験

求める人物像

「AIをプロダクトに届ける」ことに価値を感じ、研究と実装の橋渡しに意欲がある方
既存の仕組みがない環境で、自らベストプラクティスを設計・提案できる方
AIエンジニアとプロダクトエンジニア、双方と対等にコミュニケーションできる方
不確実な状況でも前向きに取り組み、段階的に基盤を整えていける方

このポジションの魅力

ゼロからのアーキテクチャ設計

レガシー制約なく、AIインフラの設計思想を自ら描ける

建設DXの最前線

世界1万件の現場に使われるプロダクトにAIを届ける当事者になれる

キャリアの成長余地

SREがセキュリティ・ガバナンスへ進化したように、AIインフラの規約・戦略策定へとキャリアを拡張できるポジション

多国籍チームでの協働

AIチーム内は英語メイン。グローバルな環境で技術力を発揮できる

技術スタック

コンテナ / オーケストレーション

Docker, ECS, EKS, GKE, Cloud Run

CI/CD

GitHub Actions, ArgoCD, モデルバージョニングパイプライン

クラウド

AWS（プライマリ）, GCP（マルチクラウド）

MLサービング

GPU推論, TorchServe, Triton Inference Server

MLOps

MLflow, Kubeflow, SageMaker, Vertex AI

モニタリング

Prometheus, Grafana, データドリフト検知