My Random Notes
Published:
Just some of my incomplete ideas:
- Some theories on fast-weight. I would like to explain why fast-weight is better than slow-weight in terms of preserving backprob gradients. The proof can be wrong.
- Ideas to learn memory with optimal control. From optimal control perspective, we can learn optimal behaviors for a neural memory. However, it turns out to be too complex to implement. Hence, I dropped this idea.
- A new form of attention, just for reconstruction. Attention mechanism is unsupervised operator, using analogy to build the attention weight. I thought we could explicitly create supervised attention to do something specifically, i.e. to optimize something. Hence, I constructed an attention where the attention weight is computed to minimize reconstruction loss. It should be useful for reconstruction (check this paper). Maybe one of my future work will retouch this idea.