You May Also Enjoy
Improving LLM Reasoning with RL Post-Training
less than 1 minute read
Published:
Large language models are getting better at reasoning, not because we made them bigger, but because we finally learned how to teach them after pre-training,a.k.a., post-training. Continuing our series on RL for LLM reasoning, today’s blog reviews recent papers that boost LLM reasoning capability via post-training with RL. If you care about strengthening a model’s intrinsic reasoning capabilities rather than bolting on expensive test-time scaling or multi-sample decoding, this overview highlights the methods that genuinely transform the model. Read more
Reason on the Fly: How RL Boosts LLM Reasoning On the Spot
less than 1 minute read
Published:
In our last post, we warmed up with why reinforcement learning (RL) is a powerful paradigm for building smarter AI reasoners. Today, we zoom in on an exciting approach: using RL at inference time to improve large language model (LLM) reasoning on the spot. In particular, we explore ways to inject real-time reasoning into static LLMs. Let’s break down how RL can transform a frozen LLM into a more dynamic, reasoned thinker at runtime. Read more
Think Before You Speak: Reinforcement Learning for LLM Reasoning
1 minute read
Published:
Large Language Models (LLMs) have shown remarkable capabilities across a range of natural language tasks. Yet, when you give them a problem that needs a bit of careful thinking, like a tricky math question or understanding a complicated document, suddenly they can stumble. It’s like they can talk the talk, but when it comes to really putting things together step-by-step, they can get lost. Read more
The Best of Time-Series Forecasting (Part II): Advancements in Time Series Modeling Through Large Language Models
less than 1 minute read
Published:
Part 1 of my blog looked at how time-series forecasting has evolved—from traditional models like ARIMA to deep learning methods like Transformers. These approaches brought big improvements, especially in handling complex and long-range patterns. However, they also have limits, especially when it comes to adapting to new data or working well across very different domains. Read more