MLP-Mixer: An all-MLP Architecture for Vision

hongvin
2 min readMay 10, 2021

--

Opening Thought: If this were to dominate CV, which means the evolution in CV field is MLP -> CNN -> Transformer -> MLP? Back to square one?

MLP-Mixer is introduced by the same team that introduces Vision Transformer (ViT).

Model Overview

MLP-Mixer

MLP-Mixer is a pure MLP architecture (duh?). First, we split the image into patches. Then, we convert the patch to feature embedding through an FC layer. Following that, we send the embedding to N x Mixer Layer. Finally, we classify the output through another FC layer. Simple, right?

Mixer Architecture

Mixer Architecture

Mixer can be divided into channel-mixing MLP (Green box) and token-mixing MLP (Orange box).

  • Channel-mixing MLPs allow communication between different channels.
  • Token-mixing MLPs allow communication between different spatial locations (tokens).

These two types of layers are interleaved to enable the interaction of both input dimensions.

Each MLP is made up of two FC layer with GELU in between them.

Varients of MLP-Mixer

Results

The author compared the two largest configurations of MLP-mixer and compared them with the SOTA. They achieve almost similar performance. However, MLP-Mixer performance dropped if the training dataset is smaller.

It is observed that MLP-Mixer and ViT has similar transfer accuracy, throughput, which is better than ResNet.

Codes

Available over here or timm. Other implementations in Pytorch includes here and here.

References

  1. “MLP-Mixer: An all-MLP Architecture for Vision” https://arxiv.org/pdf/2105.01601.pdf
  2. “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale” https://arxiv.org/pdf/2010.11929
  3. https://github.com/google-research/vision_transformer
  4. https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/mlp_mixer.py
  5. https://github.com/lucidrains/mlp-mixer-pytorch
  6. https://github.com/rishikksh20/MLP-Mixer-pytorch

Make MLP great again?

--

--

hongvin
hongvin

Written by hongvin

PhD Candidate @ University of Malaya. AI enthusiastic and likes to code.

No responses yet