4. Dynamic Programming
Dynamic Programming (Model-based approach) 1. Prediction Problem (Policy evaluation) Treat Bellman equation like an update rule Could have just used a baisc linear solver but it doesn't scaled iterative DP approach applied 2. Control Problem (Policy improvement) Policy improvement theorem : If changing an action once improves the value, changing it every time will give us a better policy Policy ..
2023.01.20