Staff Software Engineer - Agent Runtime & Infrastructure
Job Description:
The Role
You'll own two critical workstreams — the agent runtime and backend infrastructure powering every trade in our fleet, and the migration of model hosting and agent deployment to fully in-house infrastructure. This is staff-level ownership from architecture through to 3am incident response.
What You'll Build
Agent Runtime & Backend (50%)
- Plugin runtime — per-agent position tracking, trailing stop execution, and DSL state management
- Scanner gateway and rules engine — YAML-configurable evaluation layer between signals and execution
- Centralised profit-trailing service — protecting open positions even when agents are offline
- Execution layer — the MCP server bridging agents to 48+ platform tools, including position creation, market data, and exchange state
- Real-time data pipelines — enriched intelligence flowing through Redis, Postgres, and ClickHouse
Model & Agent Hosting Migration (30%)
- Migrate agent deployment to fully owned infrastructure — isolated workspaces, cron scheduling, state persistence, and one-command skill deployment
- Lead the move from external LLM APIs to self-hosted inference — own the decision and the execution
- Build agent telemetry to capture every trade decision, scanner evaluation, and signal score across the fleet
- Zero-downtime CI/CD pipelines for shipping updates to 50+ live agents without exposing open positions
Infrastructure & Operations (20%)
- Monitoring and alerting for agent failures, orphaned positions, and state corruption
- Cloud infrastructure management on AWS/EKS with infrastructure-as-code
- Own incident response — in a live trading system, every minute of downtime is real capital at risk
What You Bring
Must-haves:
- Strong production backend engineering in at least two of: Go, Python, Node.js/TypeScript — Go preferred
- Experience building backend services from scratch — APIs, job scheduling, state management, distributed systems
- Solid understanding of real-time, low-latency systems — websockets, sub-second evaluation, condition-based triggers
- Production experience with Postgres, Redis, and an analytics DB such as ClickHouse or BigQuery
- Kubernetes experience — deploying, scaling, and debugging on AWS EKS
- You have owned a system end-to-end — designed, built, deployed, operated, and fixed it under pressure
Strong plus:
- Experience with LLM infrastructure — model serving, inference optimisation, vLLM, TGI, or managed endpoints
- Background in trading systems, exchange APIs, or fintech where uptime has direct financial consequences
- Onchain infrastructure experience — wallet operations, RPC nodes, DEX integration
- Experience building multi-agent platforms or CI/CD pipelines for live trading systems
This is not a DevOps role. You'll spend 80% of your time writing code that ships to production — because at our stage, the best person to operate a system is the person who built it. If you are a backend engineer who wants to build the foundational infrastructure for a new category of autonomous financial software, this is your role.
Required Skills:
Ethereum Solidity