Design, deploy, and maintain scalable AI and machine learning systems that deliver secure, reliable, and high-performing AI solutions. Ensure efficient model serving, deployment, monitoring, and operational excellence across AI environments.
· Design, deploy, and maintain scalable AI/ML systems and infrastructure.
· Develop and manage MLOps pipelines for automated model deployment and monitoring.
· Ensure the performance, reliability, security, and scalability of AI platforms.
· Deploy, serve, and optimize machine learning and generative AI models for production environments.
· Build and maintain CI/CD pipelines for AI applications.
· Manage containerized AI applications using Docker and Kubernetes.
· Collaborate with data scientists, software engineers, and business stakeholders to operationalize AI solutions.
· Monitor AI system performance, reliability, and availability, implementing continuous improvements.
· Troubleshoot production issues and optimize AI infrastructure.
· Bachelor's degree in Computer Science, Artificial Intelligence, Data Science, Software Engineering, or a related field.
· 3–8 years of experience in AI systems engineering, MLOps, or machine learning platform engineering.
· Strong programming skills in Python.
· Experience with cloud platforms such as Microsoft Azure, AWS, or Google Cloud Platform.
· Hands-on experience with Docker, Kubernetes, and containerized deployments.
· Experience designing and maintaining CI/CD pipelines.
· Knowledge of distributed systems and scalable AI infrastructure.
· Experience deploying and operationalizing machine learning and generative AI solutions.
· Experience with Azure Machine Learning, AWS SageMaker, or Google Vertex AI.
· Experience with Infrastructure as Code (Terraform or similar).
· Familiarity with AI monitoring, observability, and model lifecycle management.
· Relevant cloud or AI certifications.