My Random Notes

less than 1 minute read

Published:

Just some of my incomplete ideas:

  • . I would like to explain why fast-weight is better than slow-weight in terms of preserving backprob gradients. The proof can be wrong.
  • . From optimal control perspective, we can learn optimal behaviors for a neural memory. However, it turns out to be too complex to implement. Hence, I dropped this idea.
  • . Attention mechanism is unsupervised operator, using analogy to build the attention weight. I thought we could explicitly create supervised attention to do something specifically, i.e. to optimize something. Hence, I constructed an attention where the attention weight is computed to minimize reconstruction loss. It should be useful for reconstruction (check this paper). Maybe one of my future work will retouch this idea.