new essay: attention-and-multiattention.md

2022-09-20 14:00:13 +08:00 · 2022-09-20 14:00:13 +08:00 · b319d4813b
parent c7d4bf61d7
commit b319d4813b
1 changed files with 46 additions and 0 deletions
--- a/content/essays/attention-and-multiattention.md
+++ b/content/essays/attention-and-multiattention.md
@ -0,0 +1,46 @@
 ---
 title: "Attention and Multiattention"
 date: 2022-09-19T20:35:33+08:00
 tags: []
 categories: []
 weight: 50
 show_comments: true
 katex: true
 draft: false
 ---
 <!--more-->
 ## 自注意力
 众所周知，注意力就是一个 query 和多个 key-value 对的带权和，如下：
 $$
 Attention(Q, K, V) = V.softmax(score(K, V))
 $$
 当 Q == K == V 的时候，这个计算就叫做自注意力
 ## 多头注意力
 假如 Head = 4， 那么如下，其中每一个 Q,K,V 都是完整大小的 Q,K,V，相当于做了 3 次注意力，其中每一个 W 都是可以学习的参数（因为 attention 的计算方法中没有可学习的参数，所以在计算 attention 前加一个线性变换，训练这个线性变换的参数）
 $$
 head_1 = Attention(W_1^QQ, W_1^K, W_1^VV)
 \\
 head_2 = Attention(W_2^QQ, W_2^K, W_2^VV)
 \\
 head_3 = Attention(W_3^QQ, W_3^K, W_3^VV)
 \\
 head_4 = Attention(W_4^QQ, W_4^K, W_4^VV)
 $$
 最后，4 次 attention 结果连接起来，使用另外一个大的线性变换：
 $$
 Multihead(Q, K, V) = W^O[head_1, head_2, head_3, head_4]
 $$
 ## 参考
 https://www.adityaagrawal.net/blog/deep_learning/attention