ML

5. Monte Carlo/ Temporal Difference

Unlike Dynamic programming, from now on we would assume that we don't know the probability dynamics of the environment. (a.k.a. $p_\pi(s',r | s,a)$ ) Model-based and Model-free each is a term for knowing dynamics and not knowing dynmaics of the environment. In real-time assuming that we know the model is not usual. Let's take a look at a case where we play go. Our action to put a go ball in chec..

2023.01.31

4. Dynamic Programming

Dynamic Programming (Model-based approach) 1. Prediction Problem (Policy evaluation) Treat Bellman equation like an update rule Could have just used a baisc linear solver but it doesn't scaled iterative DP approach applied 2. Control Problem (Policy improvement) Policy improvement theorem : If changing an action once improves the value, changing it every time will give us a better policy Policy ..

2023.01.20

3. Finite Markov Decision Processes

3.1 The Agent–Environment Interface for all $s' , s \in S, r \in R$, and $a \in A(s)$. The function p defines the dynamics of the MDP. Whereas bandit problem doesn't change the state of the environment, in reinforcement learning the action of the agent changes the state of the environment. State Trainsition Probability: (use this when reward is deterministic) The expected rewards for state–actio..

2023.01.13

2. Multi-Armed Bandits

2.1 k-armed Bandit Problem Consider the following learning problem. You are faced repeatedly with a choice among k different options, or actions. After each choice you receive a numerical reward chosen from a stationary probability distribution that depends on the action you selected. Your objective is to maximize the expected total reward over some time period, for example, over 1000 action sel..

2023.01.13

1. RL Introduction

1.1 Reinforcement Learning Reinforcement Learning is focused on goal-directed learning from interaction than other approaches to machine learning. - difference between other machine learning techniques: Reinforcement Learning is different from Reinforcement Learning Supervised/Unsupervised Learning Learning An agent must be able to learn from its own experience. Supervised learning is learned by..

2023.01.12

ResNet

When networks go deeper, a degradation problem has been exposed: with the network depth increasing, accuracy gets saturated and then degrades rapidly. 56-layer model performs worse on both training and test error -> The deeper model performs worse, but it's not caused by overfitting. Intuition: "Make it deep ,but remain shallow" Problem: Given a shallower network - how can we take it, add extra ..

2022.12.03

ML

ML

태그

최근글

댓글

공지사항

아카이브

전체 글(10)

티스토리툴바