The Mamba Effect: State Space Models Taking on Transformers

less than 1 minute read

Published: July 05, 2024

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Table of Content

Large Language Models, Transformers, and the Fundamental Bottleneck
Mamba Dissection: A Top-Down Approach
- Linear-Time Decoding
- State Space Model Foundation
- Selective State Spaces
Mamba Empirical Performance
- Mamba is Faster than Transformers
- Mamba Scales Linearly up to a Million Tokens
- State-of-the-art Performance
- Rivaling Transformers in Language Modeling
Ablation Studies
Final Thoughts
Appendix
- Explanation of the SSM Discretization Formula
- SSM CNN-RNN View Equivalence
- Mamba Relation to Gated RNN

Large Language Models, Transformers, and the Fundamental Bottleneck

Large Language Models (LLMs) are pretrained on massive datasets to achieve AGI (Artificial General Intelligence). As an unwritten rule, the Transformer [9] architecture is the backbone of LLMs due to its ability to capture rich representations through attention layers. These layers provide direct access to past inputs at any point during processing. However, this capability comes with a computational cost of O(L2) complexity, where L is the number of timesteps (tokens) the Transformer needs to process.

Mamba vs Transformer

Read the full article

Share on

Twitter Facebook LinkedIn

Reason on the Fly: How RL Boosts LLM Reasoning On the Spot

less than 1 minute read

Published: June 04, 2025

In our last post, we warmed up with why reinforcement learning (RL) is a powerful paradigm for building smarter AI reasoners. Today, we zoom in on an exciting approach: using RL at inference time to improve large language model (LLM) reasoning on the spot. In particular, we explore ways to inject real-time reasoning into static LLMs. Let’s break down how RL can transform a frozen LLM into a more dynamic, reasoned thinker at runtime. Read more

Think Before You Speak: Reinforcement Learning for LLM Reasoning

1 minute read

Published: May 21, 2025

Large Language Models (LLMs) have shown remarkable capabilities across a range of natural language tasks. Yet, when you give them a problem that needs a bit of careful thinking, like a tricky math question or understanding a complicated document, suddenly they can stumble. It’s like they can talk the talk, but when it comes to really putting things together step-by-step, they can get lost. Read more

The Best of Time-Series Forecasting (Part II): Advancements in Time Series Modeling Through Large Language Models

less than 1 minute read

Published: April 09, 2025

Part 1 of my blog looked at how time-series forecasting has evolved—from traditional models like ARIMA to deep learning methods like Transformers. These approaches brought big improvements, especially in handling complex and long-range patterns. However, they also have limits, especially when it comes to adapting to new data or working well across very different domains. Read more

The Mamba Effect: State Space Models Taking on Transformers

Table of Content

Large Language Models, Transformers, and the Fundamental Bottleneck

Share on

You May Also Enjoy

Reason on the Fly: How RL Boosts LLM Reasoning On the Spot

Think Before You Speak: Reinforcement Learning for LLM Reasoning

The Best of Time-Series Forecasting (Part II): Advancements in Time Series Modeling Through Large Language Models

The Best of Time-Series Forecasting (Part I): From Seasonal Patterns to Transformer Models