GRPO Advantage Visualizer

Group Size (n)

Reward Min

Reward Max

Epsilon

Individual Reward Controls

1. Reward Landscape

Drag bars to adjust rewards

2. Computed Advantages

Interactive Z-Score mapping

\(A_i = \frac{r_i - \bar{r}}{\sigma_r + \varepsilon}\)