GRPO Advantage Visualizer

Z-score variant: \(A_i = \dfrac{r_i - \bar{r}}{\sigma_r + \varepsilon}\). When \(\sigma_r = 0\), all advantages are set to 0.
Mean-centered variant: \(A_i = r_i - \bar{r}\) (no variance normalization).