Improving LLM Reasoning with RL Post-Training
Published:
Large language models are getting better at reasoning, not because we made them bigger, but because we finally learned how to teach them after pre-training,a.k.a., post-training. Continuing our series on RL for LLM reasoning, today’s blog reviews recent papers that boost LLM reasoning capability via post-training with RL. If you care about strengthening a model’s intrinsic reasoning capabilities rather than bolting on expensive test-time scaling or multi-sample decoding, this overview highlights the methods that genuinely transform the model. Read more