Software Engineer RL Environments - San Francisco, CA - $180K-$220K

Job Openings Software Engineer RL Environments - San Francisco, CA - $180K-$220K

About the job Software Engineer RL Environments - San Francisco, CA - $180K-$220K

Location: San Francisco, CA (in-person)

Compensation: $180,000 - $220,000 base, plus substantial profit share and competitive equity (expected total cash compensation around $500,000)

Join a fast-growing AI infrastructure company as a Software Engineer focused on RL environments, designing the datasets and evaluation rubrics that directly shape how frontier AI models learn.

What You'll Do

- Design data slices and explore data shapes that expose meaningful model failure modes across domains like finance, code, and enterprise workflows

- Build and refine evaluation rubrics and reward signals for RLHF and RLVR training pipelines

- Model annotator behavior and run experiments to improve different model capabilities

- Develop quantitative frameworks for measuring dataset quality, diversity, and downstream impact on model alignment and capability

- Create and manage both real-world and synthetic data pipelines

- Partner with research teams at top AI labs to translate their training objectives into concrete data and evaluation specifications

What You'll Bring

- 1-4 years of software engineering experience with strong technical depth

- A genuine obsession with how data structure, selection, and quality drive model behavior

- The ability to design lightweight experiments, move fast, and extract actionable insights from messy results

- Comfort working across domains such as finance, software engineering, and policy

- A strong track record of shipping, with a clear bias toward building over theorizing

Nice to Have

- Prior work or internship at an RL environment company, AI safety organization, or benchmarking organization

- Experience as a founder or early engineer at an early-stage startup

- Experience building real-world and synthetic data pipelines

- Familiarity with RLHF or RLVR training pipelines

This is a high-leverage engineering seat with direct impact on how frontier AI models are trained, working hands-on with research teams at the world's leading AI labs.

Or refer someone