티스토리

ML

검색하기

블로그 홈

ML

kimdj104.tistory.com/m

kimdj104 님의 블로그입니다.

구독자: 0

방명록 방문하기

주요 글 목록

5. Monte Carlo/ Temporal Difference Unlike Dynamic programming, from now on we would assume that we don't know the probability dynamics of the environment. (a.k.a. $p_\pi(s',r | s,a)$ ) Model-based and Model-free each is a term for knowing dynamics and not knowing dynmaics of the environment. In real-time assuming that we know the model is not usual. Let's take a look at a case where we play go. Our action to put a go ball in chec.. 공감수 0 댓글수 0 2023. 1. 31.
4. Dynamic Programming Dynamic Programming (Model-based approach) 1. Prediction Problem (Policy evaluation) Treat Bellman equation like an update rule Could have just used a baisc linear solver but it doesn't scaled iterative DP approach applied 2. Control Problem (Policy improvement) Policy improvement theorem : If changing an action once improves the value, changing it every time will give us a better policy Policy .. 공감수 0 댓글수 0 2023. 1. 20.
3. Finite Markov Decision Processes 3.1 The Agent–Environment Interface for all $s' , s \in S, r \in R$, and $a \in A(s)$. The function p defines the dynamics of the MDP. Whereas bandit problem doesn't change the state of the environment, in reinforcement learning the action of the agent changes the state of the environment. State Trainsition Probability: (use this when reward is deterministic) The expected rewards for state–actio.. 공감수 0 댓글수 0 2023. 1. 13.
2. Multi-Armed Bandits 2.1 k-armed Bandit Problem Consider the following learning problem. You are faced repeatedly with a choice among k different options, or actions. After each choice you receive a numerical reward chosen from a stationary probability distribution that depends on the action you selected. Your objective is to maximize the expected total reward over some time period, for example, over 1000 action sel.. 공감수 0 댓글수 0 2023. 1. 13.
1. RL Introduction 1.1 Reinforcement Learning Reinforcement Learning is focused on goal-directed learning from interaction than other approaches to machine learning. - difference between other machine learning techniques: Reinforcement Learning is different from Reinforcement Learning Supervised/Unsupervised Learning Learning An agent must be able to learn from its own experience. Supervised learning is learned by.. 공감수 0 댓글수 0 2023. 1. 12.
ResNet When networks go deeper, a degradation problem has been exposed: with the network depth increasing, accuracy gets saturated and then degrades rapidly. 56-layer model performs worse on both training and test error -> The deeper model performs worse, but it's not caused by overfitting. Intuition: "Make it deep ,but remain shallow" Problem: Given a shallower network - how can we take it, add extra .. 공감수 0 댓글수 0 2022. 12. 3.
Ubuntu 22.10(Nvidia driver, Cuda, CuDNN installation) Ubuntu 22.04 virtual_box,VMware (가상머신에 설치하는 방법) 불가능 -> 돌아가는 버전 듀얼부팅으로 설치 그래픽카드 정보 및 드라이버 확인 그래픽카드 및 설치 가능한 드라이버 확인 ubuntu-drivers devices 현재 사용중인 그래픽카드 확인 lshw -numeric -C display lspci | grep -i nvidia 1. 권장드라이버 설치 sudo ubuntu-drivers autoinstall 2. 수동으로 설치 sudo apt install nvidia-driver-515 sudo reboot 클린설치 sudo apt-get remove --purge nvidia-* sudo apt-get autoremove sudo apt-get update 그래픽카드 .. 공감수 0 댓글수 0 2022. 12. 2.
Solution Challenge Idea Collection Idea 1. 디지털화로 인해 문해력이 안 좋아진 문제 -> 독해력에 도움을 줄 수 있도록 글에 삽화를 해주는 기능 MVP 단계: 어려운 줄글로 돼 있는 영어 소설책으로 한정 e-book으로 책을 읽으면서 Summary 도 볼 수 있도록 해서 글이 이해가 안 될 때 도움을 받을 수 있도록 구현 ( ) 삽화를 만드는 과정에서 사용자가 마음에 드는 삽화는 check할 수 있도록 설정 (좋아요, 싫어요) ( ) SDGs: (Goal 4) Quality education step1) NLP studio.oneai.com 에서 summarize keywords headline 추출하는 기능이 구현 https://studio.oneai.com/ One AI Language Studio One AI is an API.. 공감수 1 댓글수 0 2022. 11. 22.
Transformer(Attention is all you need) Autoregressive LM(GPT) vs Autoencoding LM(BERT) Autoregressive LM: Causal Language Model Autoencoding LM: Masked Language Model Transformer Architecture Tokenizing vs Embedding vs Encoding Tokenizing: process which converts text to token idx Embedding: process which converts Tokenized Words to Vectors Encoding: process which converts embedded Vectors to Sentence Matrix Positional Encoding Positi.. 공감수 0 댓글수 0 2022. 11. 17.
CoCa(Contrastive Captioners) Pretraining method : encoder-decoder models encoder dual encoder decoder transfer learning multimodal : In CoCa using text data + image data modality: In the context of human–computer interaction, a modality is the classification of a single independent channel of sensory input/output between a computer and a human. A system is designated unimodal if it has only one modality implemented, and mul.. 공감수 0 댓글수 0 2022. 11. 3.

문의안내

티스토리
로그인
고객센터

티스토리는 카카오에서 사랑을 담아 만듭니다.