Importance sampling placement in TDRC

Temporal-Difference Learning with Regularized Corrections (TDRC) is one of the few stable algorithms for learning a value function off-policy with function approximation (e.g., neural networks, tile coding, radial basis functions, etc.). “Stable” in this case means the expected update over the state distribution is a contraction, and applying the update will decrease error in expectation. Some of the most popular value function learning algorithms like Retrace/V-trace are not stable and are not guaranteed to reduce error or converge when used with function approximation. This is an absolutely crucial property for a learning algorithm to have, especially if it’s going to be used in real-world applications.

Productivity during a pandemic

The abrupt switch to working from home utterly destroyed my routines and with them my ability to get anything done. A long period of floundering later, I realized the problem wasn’t fixing itself and I needed to be more deliberate about my approach. This post contains some suggestions to hopefully help you work more effectively from home.

Gram matrix intuition

Recently I came across the Gram matrix and was trying to understand what it meant intuitively. Like most of my attempts to understand new mathematical concepts intuitively, the process quickly became recursive as the definitions relied on other terms I was unfamiliar with.