Reason on the Fly: How RL Boosts LLM Reasoning On the Spot

A Brief History of Model Merging

less than 1 minute read

Published: January 23, 2026

Model merging has recently emerged as a sophisticated method of “synaptic synthesis,” integrating specialized weights from disparate models into a singular, cohesive architecture. Just as an alchemist mixes chemicals in the hope of forging new materials that possess the superior properties of their originals, mixing neural networks allows us to synthesize specialized knowledge without the high cost of model retraining. It is especially relevant for Large Language Models (LLMs) as the cost of finetuning is huge. Moreover, as there are numerous pretrained models on different domains and modalities, it would be ideal if we could combine the models to create a universal master model that specializes in any topic. Read more

Improving LLM Reasoning with RL Post-Training

less than 1 minute read

Published: December 09, 2025

Large language models are getting better at reasoning, not because we made them bigger, but because we finally learned how to teach them after pre-training,a.k.a., post-training. Continuing our series on RL for LLM reasoning, today’s blog reviews recent papers that boost LLM reasoning capability via post-training with RL. If you care about strengthening a model’s intrinsic reasoning capabilities rather than bolting on expensive test-time scaling or multi-sample decoding, this overview highlights the methods that genuinely transform the model. Read more

Think Before You Speak: Reinforcement Learning for LLM Reasoning

1 minute read

Published: May 21, 2025

Large Language Models (LLMs) have shown remarkable capabilities across a range of natural language tasks. Yet, when you give them a problem that needs a bit of careful thinking, like a tricky math question or understanding a complicated document, suddenly they can stumble. It’s like they can talk the talk, but when it comes to really putting things together step-by-step, they can get lost. Read more

The Best of Time-Series Forecasting (Part II): Advancements in Time Series Modeling Through Large Language Models

less than 1 minute read

Published: April 09, 2025

Part 1 of my blog looked at how time-series forecasting has evolved—from traditional models like ARIMA to deep learning methods like Transformers. These approaches brought big improvements, especially in handling complex and long-range patterns. However, they also have limits, especially when it comes to adapting to new data or working well across very different domains. Read more

Reason on the Fly: How RL Boosts LLM Reasoning On the Spot

Share on

You May Also Enjoy

A Brief History of Model Merging

Improving LLM Reasoning with RL Post-Training

Think Before You Speak: Reinforcement Learning for LLM Reasoning

The Best of Time-Series Forecasting (Part II): Advancements in Time Series Modeling Through Large Language Models