About the job Lead AI Platfom Engineer - Remote Portugal
Lead AI Platform Engineer (AWS Bedrock/Step Functions) - Full Remote Portugal
ABOUT THE OPPORTUNITY
We are partnering with an innovative international technology company focused on building scalable AI-native platforms that support high-performance digital products used worldwide. This is an opportunity for a senior engineering leader who wants to shape the future of enterprise AI infrastructure, autonomous agent workflows, and cloud-native distributed systems.
As a Lead AI Platform Engineer, you will work closely with executive leadership and engineering teams to define and implement the architecture behind advanced AI solutions. The role combines hands-on technical leadership with platform strategy, reliability engineering, and internal developer experience improvements. You will play a critical role in enabling secure, scalable, and production-ready AI ecosystems while mentoring engineering teams and driving technical excellence across the organization.
This is a fully remote position based in Portugal, with occasional national and international travel requirements estimated at 0%–15%.
PROJECT & CONTEXT
The project focuses on designing and evolving a modern AI platform ecosystem powered by AWS cloud technologies and agent-based architectures. The engineering environment is highly collaborative, fast-paced, and centered around cloud automation, observability, and AI workload orchestration.
You will lead initiatives involving AWS Bedrock AgentCore, AWS Step Functions, multi-tenant AI infrastructure, vector databases, and automated CI/CD pipelines for AI workloads. The platform supports intelligent agent execution, evaluation pipelines, retrieval-augmented generation (RAG), and internal developer platforms (IDP).
The stack includes AWS Networking and IAM, Terraform v1.x, GitHub Actions, Kubernetes, Docker, Python 3.x, Bash scripting, JavaScript/TypeScript, OpenSearch, Pinecone, Milvus, Datadog, CloudWatch, LangSmith, Vault, Artifactory, Backstage, and policy enforcement frameworks such as OPA and Cedar.
English is required for daily communication with international stakeholders and distributed engineering teams.
WHAT WE'RE LOOKING FOR (Required)
- Strong experience designing and operating cloud-native platforms on AWS
- Proven expertise with AWS Bedrock AgentCore and AWS Step Functions in production environments
- Experience building custom Agent/Tool Gateways and AI orchestration workflows
- Advanced knowledge of Infrastructure as Code using Terraform v1.x
- Hands-on experience with CI/CD automation using GitHub Actions
- Strong containerization and orchestration experience with Docker and Kubernetes
- Experience building scalable microservices architectures
- Solid understanding of AI observability, monitoring, and tracing tools such as Datadog, CloudWatch, or LangSmith
- Experience with vector databases including OpenSearch, Pinecone, or Milvus
- Strong understanding of RAG architectures and AI knowledge retrieval strategies
- Experience implementing secure multi-tenant environments and IAM policies
- Familiarity with policy and governance frameworks such as OPA or Cedar
- Strong scripting and automation skills using Python 3.x, Bash, or JavaScript/TypeScript
- Experience collaborating with senior stakeholders and translating technical concepts into business value
- Excellent communication skills in English (written and spoken)
NICE TO HAVE (Preferred)
- Experience with Internal Developer Platforms (IDP) and developer enablement initiatives
- Familiarity with Backstage for platform engineering and developer experience improvements
- Experience implementing AI model evaluation frameworks (Evals)
- Knowledge of non-deterministic AI agent behavior analysis and reliability engineering practices
- Previous experience mentoring engineering teams or acting as a technical lead
- Exposure to enterprise security tooling such as Vault and Artifactory
- Experience working in distributed international teams
- Portuguese language skills are considered a plus
- Experience supporting large-scale AI-native or autonomous agent ecosystems