calculating attention scores in Bahdanau attention in tensorflow using decoder hidden state and encoder output

This question relates to the neural machine translation shown here: Neural Machine Translation

self.W1 and self.W2 are initialized in lines 4 and 5 in the __init__ function of class BahdanauAttention.

Units (which is the number of RNN units) is initialized elsewhere in the code to 1024, and the tf.keras.layers.Layer is initialized to 10 in the class BahdanauAttention initialization.

My question is:

In the code image attached, I am not sure I understand the feed forward neural network set up in line 17 and line 18. So, I broke this formula down into it's parts. See line 23 and line 24.

a) But, I am not quite sure what self.W1(query_with_time_axis) and self.W2(values) mean or how to read/interpret these constructs? This is not a dot product of self.W1 with query_with_time_axis OR that of self.W2 with values.

Is self.W1(query_with_time_axis) and self.W2(values) an initialization of the weights of self.W1 with the values of query_with_time_axis and an initialization of the weights of self.W2 with the values of values? Adding weights alone does not make sense either.

Were self.W1 and self.W2 randomly initialized during class initialization? These operations don't look like the logit Z = WX + b (with the bias ignored).

These constructs result in tensors with dimensions (64, 1, 10) and (64, 16, 10) respectively.

b) The parameter of 10 in the class initialization is 10 layers and not 10 output neurons. So I am not sure why 10 is the third dimension of these output tensors?

I know:

a) query_with_time_axis is a tensor with dimensions (64, 1, 1024)

b) values is a tensor with dimensions (64, 16, 1024)

Both query_with_time_axis and values can be added (when the 1 is broadcast during the addition)

See line 17, 18, 23 and 24



from Recent Questions - Stack Overflow https://ift.tt/342EDVF
https://ift.tt/2G9WcuO

Comments

Popular posts from this blog

Spring Elasticsearch Operations

Network Error and Timeout on Authorize.net JS

Object oriented programming concepts (OOPs)