카테고리 없음

Transformer(Attention is all you need)

kimdj104 2022. 11. 17. 17:15

Autoregressive LM(GPT) vs Autoencoding LM(BERT)

 

Autoregressive LM: Causal Language Model

Causal Language Model

Autoencoding LM: Masked Language Model

 

Bert vs GPT

 

Transformer Architecture

The Transformer - model architecture.

Tokenizing vs Embedding vs Encoding

  • Tokenizing: process which converts text to token idx
  • Embedding: process which converts Tokenized Words to Vectors
  • Encoding: process which converts embedded Vectors to Sentence Matrix

TOKENIZING

 

 

EMBEDDING

 

ENCODING

 

Positional Encoding

Positional encoding describes the location or position of an entity in a sequence so that each position is assigned a unique representation. Positional encoding ensures meaning in the order of words

Full Architecture of Transformer

connection between encoder decoder

Full Architecture of Transformer

 

Self-Attention

Self attention: sometimes called intra-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence.

 

attention mask: Tells the model where it should b other to pay attention to for any tokens model will ignore those tokens and it will not be possible to use them to compute the model output. Discriminate between real tokens and padding tokens.

 

 

 

 

References:

[1] https://beausty23.tistory.com/223 (Embedding vs Encoding)