如果您想在时间维度上关注,那么这段代码对我来说似乎是正确的:
activations = LSTM(units, return_sequences=True)(embedded)
# compute importance for each step
attention = Dense(1, activation='tanh')(activations)
attention = Flatten()(attention)
attention = Activation('softmax')(attention)
attention = RepeatVector(units)(attention)
attention = Permute([2, 1])(attention)
sent_representation = merge([activations, attention], mode='mul')
您已经计算出shape的注意力向量(batch_size, max_length)
:
attention = Activation('softmax')(attention)
K.sum(xin, axis=-2)