GRPO Advantage Visualizer

1. Reward Landscape
Drag bars to adjust rewards
2. Computed Advantages
Interactive Z-Score mapping
\(A_i = \frac{r_i - \bar{r}}{\sigma_r + \varepsilon}\)