Program Memory: Introduction

4 minute read

Published:

Memory-augmented neural networks (MANNs) store data in their external memory, resembling Turning Machines. Despite being theoretically Turing-Complete, MANNs cannot be trained flexibly to solve any task due to the lack of program memory. Without storing programs, it is hard to perform complicated tasks such as simulating recursive functional calls or implementing divide-and-conquer algorithms. As long as programs are not treated as data, the computing capability of neural networks is still limited.

From specialized to universal computing

A program contains instructions to execute specific tasks or functions. Low-level machines are designed with fixed programs and thus only specialize in a narrow band of tasks (e.g. washing machines, printers or microwaves). In these machines, the program is embedded in ROM, cannot be changed, and the dynamic memory RAM only stores data/variables for the program. They are also known as single-purpose machines. Modern computers, on the other hand, can be used for a variety of tasks, including graphic rendering, physics simulation or website hosting. They are programmed to store, modify and execute many other single-purpose programs and thus, are deemed as general-purpose machines. The core of such generality is the concept of stored-program memory that drives the architecture design of general-purpose computers.

One of the earliest prototypes of general-purpose computers is the Universal Turing Machine (UTM). This theoretical machine uses a memory tape (assumedly infinite) to store the programs of Turing Machines (TM). When the current task demands a specific program, the UTM’s program loads the TM program from memory and executes the program to compute the task. The stored-program memory idea is critical for universal and dynamic computation. It enables high-level programming concepts such as operating system or program that runs/programs. Later designs like Harvard or Von-Neumann architectures follow this idea with more practical and complicated interfaces and instruction sets.

From universal computing to universal learning

Modern computers can almost compute any task. However, they need human to control and design programs for the tasks at hand. In contrast, AI can automatically learn to solve the task, yet their computable tasks are often simple. As a result, we are looking for hybrid solutions that combine the best of both worlds. As such, it is necessary to equip AI with stored-program memory.



Stored-program memory stores mini-programs (lego blocks), which are retrieved to compose suitable programs (lego structures) for main networks to handle the task at hand. Can we learn everything end-to-end?

Unlike programs in computers, AI’s programs can be treated as parameters of machine learning models (e.g. the weights of neural networks) or standard programs in forms of formal language. The latter, also known as program synthesis, resembles more the process of composing machinery programs where the learned program is actually a program with predefined syntax and constraints. The former represents a more flexible and generic way to create programs, which can apply to human’s program synthesis. Under this view, neural networks’ weights are basically programs that compute specific tasks (object recognition, math reasoning, …). After training, the weights converge to the program for the task.

Put in the context of stored-program memory, we need a memory-augmented neural network that stores and accesses the weight of other neural networks to implement a general-purpose neural network. Human learns from observations to dynamically load the right programs to act in response to environments. For example, when you drive, the brain loads a “driving program” from your motor memory to control your body’s part to manipulate driving. We aim to let AI learn similar behaviors through its stored-program memory. Remarkably, it should be learned end-to-end where the AI will figure out how to learn the individual programs and the master program that runs other programs.

Now comes the key question: how to build program memory? The answer can be found in the next article - Program Memory: Method (part 1)

Check some of our papers regarding the topic: