FRONTIER LLM POST-TRAINING · RL TRAINING ENVIRONMENTS

Hi, I’m Bill. I turn new ideas into AI that ships.

I’m drawn to problems nobody has cracked yet, and I build for durable, long-term impact — originating new directions, then driving them to production. Frontier-model post-training at Amazon AGI today; the AI behind autonomous MRI at Q.bio before that.

Yuhua (Bill) Chen
SCROLL ↓

Teaching frontier models to reason — and to act.

Post-training for frontier LLMs at Amazon AGI — where the model stops being a text predictor and starts being an agent.

RL training & verifiable rewards

Reinforcement learning for long-horizon reasoning — GRPO-style rollouts, verifier-backed rewards, and the train/eval infrastructure that turns a good idea into a repeatable model-improvement loop.

Curriculum-calibrated gyms

Synthetic environments tuned to keep a model at the edge of its ability — difficulty that scales with capability, so reward never saturates or goes all-zero, and the reasoning gains transfer beyond the training domain.

SEE THE RL ROLLOUT, LIVE →DEEP-DIVE WRITE-UP IN PROGRESS

Multimodal diffusion models for autonomous MRI.

Same instinct, earlier chapter: push frontier generative models until they actually work. I led the AI behind Q.bio’s autonomous MRI — billion-parameter multimodal volumetric foundation models and diffusion-based reconstruction — and shipped 3D models into the scanner at sub-second inference.

3000×

faster reconstruction

27×

higher image quality

faster scans

From reconstructing scans to training reasoning agents.

  • Reinforcement learning for long-horizon, multi-step reasoning.
  • Agentic gyms & synthetic-data engines for frontier LLMs.
  • Algorithm development for multi-step agentic capability.
EDUCATIONUniversity of California, Los Angeles · Ph.DUniversity of Pennsylvania · MasterUniversity of Pennsylvania · MasterNortheastern University · Bachelor

When I’m not training models, I’m training for marathons.

When I'm not training neural networks, I'm training for marathons. Running helps me think through complex problems and stay grounded.

3:25

MARATHON PB

5+

FULL MARATHONS

Crossing the finish line at the California International Marathon.
CALIFORNIA INTERNATIONAL MARATHON · FINISH

Always Building

Off the clock I tinker — a self-hosted journal, a little home lab humming with side projects, small tools I build for myself. Keeps the craft hands-on and the curiosity fed.

The problems worth my time don't have a recipe yet. And the part I enjoy just as much is the unglamorous one — making a new idea actually hold up in a real system, at scale, and still matter a year later.