Proposes
- Squeeze and Excitation block that can be added to a ConvLayer
- This block helps dynamically “excite” feature maps that help classification and suppress feature maps that don’t help based on the patterns of global averages of feature maps

The highlighted part is the SE block. With a small network inside, it tries to detect a good pattern in global average of features, and then excites or suppresses those features in a way that helps classification.
Implementation in Keras
def se_block(x, ratio=16):
shape = x.shape.as_list()
filters = shape[-1]
z = GlobalAveragePooling2D()(x)
s = Dense(filters // ratio, activation='relu', use_bias=False)(z)
s = Dense(filters, activation='sigmoid', use_bias=False)(s)
x = Multiply()([x, s])
return x
Insights
- This paper assumes that global averages of feature maps posit some patterns that help classification
- By recognizing such patterns, the SE block will be able to “dynamically” scale (multiplying by 0–1) feature maps in such a way that help classification (by exciting relevant features and suppress irrelevant features)
- From another perspective, the scale of each feature map now depends on the scales of other feature maps. So, SE blocks can be view as “modelling interdependencies between channels”
- The term “squeeze” comes from the pattern recognition model in each SE block which is a bottleneck network with single hidden layer (ReLU) and logistic outputs
- The “squeeze” is mainly done because of scalability reasons because if there is no such bottleneck layer, parameter requirements can be huge (the paper suggests the reduction ratio of 16)
- The effect of SE block is minuscule in earlier layers because there is no good pattern to be recognized since almost all the features are equally shared
- The effect of SE block becomes greater in later layers because now each layer is more specialized for each class, that means good patterns can be found easily
- So, if your network is not that deep, SE blocks won’t help much
- In actual implementation, the overhead of SE block is rather small

In the top row, you see that all the lines are mostly identical SE blocks don’t help in these cases. In the second row however, you see that lines are separated which means SE blocks are doing their jobs.
Original Paper https://arxiv.org/pdf/1709.01507.pdf