Proposes

Squeeze and Excitation block that can be added to a ConvLayer
This block helps dynamically “excite” feature maps that help classification and suppress feature maps that don’t help based on the patterns of global averages of feature maps

The highlighted part is the SE block. With a small network inside, it tries to detect a good pattern in global average of features, and then excites or suppresses those features in a way that helps classification.

Implementation in Keras

def se_block(x, ratio=16):
    shape = x.shape.as_list()
    filters = shape[-1]
    z = GlobalAveragePooling2D()(x)
    s = Dense(filters // ratio, activation='relu', use_bias=False)(z)
    s = Dense(filters, activation='sigmoid', use_bias=False)(s)
    x = Multiply()([x, s])
    return x

Insights

This paper assumes that global averages of feature maps posit some patterns that help classification
By recognizing such patterns, the SE block will be able to “dynamically” scale (multiplying by 0–1) feature maps in such a way that help classification (by exciting relevant features and suppress irrelevant features)
From another perspective, the scale of each feature map now depends on the scales of other feature maps. So, SE blocks can be view as “modelling interdependencies between channels”
The term “squeeze” comes from the pattern recognition model in each SE block which is a bottleneck network with single hidden layer (ReLU) and logistic outputs
The “squeeze” is mainly done because of scalability reasons because if there is no such bottleneck layer, parameter requirements can be huge (the paper suggests the reduction ratio of 16)
The effect of SE block is minuscule in earlier layers because there is no good pattern to be recognized since almost all the features are equally shared
The effect of SE block becomes greater in later layers because now each layer is more specialized for each class, that means good patterns can be found easily
So, if your network is not that deep, SE blocks won’t help much
In actual implementation, the overhead of SE block is rather small

In the top row, you see that all the lines are mostly identical SE blocks don’t help in these cases. In the second row however, you see that lines are separated which means SE blocks are doing their jobs.

Original Paper https://arxiv.org/pdf/1709.01507.pdf