gradient approximation

Train Models with Evolutionary Strategies

The paper discussed demonstrates that using only 30 random Gaussian perturbations can effectively approximate a gradient, outperforming GRPO on RLVR tasks without overfitting. This approach significantly speeds up training as it eliminates the need for backward passes. The author tested and confirmed these findings by cleaning up the original codebase and successfully replicating the results. Additionally, they implemented LoRA and pass@k training, with plans for further enhancements, encouraging others to explore evolutionary strategies (ES) for training thinking models. This matters because it offers a more efficient method for training models, potentially advancing machine learning capabilities.
Read Full Article
Read Full Article: Train Models with Evolutionary Strategies

Posted on

Jan 4, 2026

by

AIGeekery

in

Deep Dives, Learning

Topics: machine learning, AI models, AI development