GRPO Advantage Visualizer
Randomize
Clear
Group Size (n)
Reward Min
Reward Max
Epsilon
Individual Reward Controls
1. Reward Landscape
Drag bars to adjust rewards
2. Computed Advantages
Interactive Z-Score mapping
\(A_i = \frac{r_i - \bar{r}}{\sigma_r + \varepsilon}\)