CoCa(Contrastive Captioners)

ML session

CoCa(Contrastive Captioners)

kimdj104 2022. 11. 3. 18:57

Pretraining method :

encoder-decoder models
encoder
dual encoder
decoder
transfer learning

multimodal : In CoCa using text data + image data

modality: In the context of human–computer interaction, a modality is the classification of a single independent channel of sensory input/output between a computer and a human. A system is designated unimodal if it has only one modality implemented, and multimodal if it has more than one. When multiple modalities are available for some tasks or aspects of a task, the system is said to have overlapping modalities. If multiple modalities are available for a task, the system is said to have redundant modalities. Multiple modalities can be used in combination to provide complementary methods that may be redundant but convey information more effectively. Modalities can be generally defined in two forms: human-computer and computer-human modalities.

Losses

Contrastive loss:

pull positive samples and push negative samples

Captioning loss:

zero shot manner: