Proposes

The highlighted part is the SE block. With a small network inside, it tries to detect a good pattern in global average of features, and then excites or suppresses those features in a way that helps classification.

Implementation in Keras

def se_block(x, ratio=16):
    shape = x.shape.as_list()
    filters = shape[-1]
    z = GlobalAveragePooling2D()(x)
    s = Dense(filters // ratio, activation='relu', use_bias=False)(z)
    s = Dense(filters, activation='sigmoid', use_bias=False)(s)
    x = Multiply()([x, s])
    return x

Insights

In the top row, you see that all the lines are mostly identical SE blocks don’t help in these cases. In the second row however, you see that lines are separated which means SE blocks are doing their jobs.

Original Paper https://arxiv.org/pdf/1709.01507.pdf