Transformer(Attention is all you need)

카테고리 없음

Transformer(Attention is all you need)

kimdj104 2022. 11. 17. 17:15

Autoregressive LM(GPT) vs Autoencoding LM(BERT)

Autoregressive LM: Causal Language Model

Autoencoding LM: Masked Language Model

Transformer Architecture

Tokenizing vs Embedding vs Encoding

Tokenizing: process which converts text to token idx
Embedding: process which converts Tokenized Words to Vectors
Encoding: process which converts embedded Vectors to Sentence Matrix

Positional Encoding

Positional encoding describes the location or position of an entity in a sequence so that each position is assigned a unique representation. Positional encoding ensures meaning in the order of words

Full Architecture of Transformer

connection between encoder decoder

Self-Attention

Self attention: sometimes called intra-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence.

attention mask: Tells the model where it should b other to pay attention to for any tokens model will ignore those tokens and it will not be possible to use them to compute the model output. Discriminate between real tokens and padding tokens.

References:

[1] https://beausty23.tistory.com/223 (Embedding vs Encoding)