You are viewing a preview of this job. Log in or register to view more details about this job.

RL Environments Engineer (remote, contractor)

RL Environments Engineer - Low-Level Engineering & Kernel Inference Optimization (Remote, Contractor)

About the Company

Preference Model is building the next generation of training data to power the future of AI. Today's models are powerful but fail to reach their potential across diverse use cases because so many of the tasks that we want to use these models are out of distribution. Preference Model creates RL environments where models encounter research and engineering problems, iterate, and learn from realistic feedback loops.

Our founding team has previous experience on Anthropic's data team building data infrastructure, tokenizers, and datasets behind the Claude model. We are partnering with leading AI labs to push AI closer to achieving its transformative potential.

The company is backed by Tier 1 Silicon Valley VC.

Brief Description of the Role

We're hiring Low-Level Engineers to design and build RL environments that teach LLMs kernel development, hardware optimization, and systems programming. The goal is to create realistic feedback loops where models learn to write high-performance code across GPU and CPU architectures.

This is a remote contractor role with ≥4 hours overlap to PST and advanced English (C1/C2) required.

Requirements

Minimal Qualifications

Strong Python (engineering-quality, not notebook-only)
Production mindset (debugging, reliability, iteration speed)
Clear understanding of LLMs, their current limitations
Ability to meet throughput expectations and respond quickly to feedback

You may be a good fit if one of the following applies

Deep understanding of memory hierarchies (registers, L1/L2/shared memory, HBM, system RAM) and their performance implications
Threading models, synchronization primitives, and concurrent programming (warps, thread blocks, barriers, atomics)
Cache coherence, memory access patterns, coalescing, and bank conflicts
JIT compilation frameworks (e.g., Triton, JAX/XLA, TorchInductor, Numba)
AOT compilation and optimization passes (LLVM, MLIR, TVM)
Compiler and kernel frameworks such as CUTLASS, BitBLAS, or JAX/Pallas
Modern C++, including templates, concurrency, and build systems
Assembly-level programming and low-level optimization across GPU and CPU architectures (e.g., x86, ARM, NVIDIA Hopper, NVIDIA Blackwell)
Debugging and optimizing GPU kernels using CUDA and/or HIP/ROCm
Developing PyTorch custom operators, backend extensions, or dispatcher integrations (e.g., ATen, TorchScript, or custom backends)
Customizing, extending, or optimizing vLLM, including distributed inference workflows
GPU communication libraries and collectives, such as NVIDIA NCCL, AMD RCCL, MPI, or UCX
Mixed-precision and low-precision kernels (e.g., FP16, BF16, FP8, INT8), including numerical stability and performance trade-offs

Compensation

Hourly contractor rate: $50-$150 USD/hour (dependent on the expertise level and quality of take-home assignment).

Context and Take-home Assignment

A take-home coding assignment is required as part of the evaluation; details will be provided upon application.