What is token in ViT?
I'm trying to understand the concept of 'token' in the context of Vision Transformer (ViT). Could someone explain what it represents and how it's used in this model?
What is the CLS token in vit?
I'm trying to understand the concept of the CLS token in the context of Vision Transformer (ViT). Could someone explain its purpose and how it fits into the overall architecture?