leafee98-blog/content/essays/attention-and-multiattention.md
2022-09-20 14:01:40 +08:00

47 lines
1.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "自注意力和多头注意力"
date: 2022-09-19T20:35:33+08:00
tags: []
categories: []
weight: 50
show_comments: true
katex: true
draft: false
---
<!--more-->
## 自注意力
众所周知,注意力就是一个 query 和多个 key-value 对的带权和,如下:
$$
Attention(Q, K, V) = V.softmax(score(K, V))
$$
当 Q == K == V 的时候,这个计算就叫做自注意力
## 多头注意力
假如 Head = 4 那么如下,其中每一个 Q,K,V 都是完整大小的 Q,K,V相当于做了 3 次注意力,其中每一个 W 都是可以学习的参数(因为 attention 的计算方法中没有可学习的参数,所以在计算 attention 前加一个线性变换,训练这个线性变换的参数)
$$
head_1 = Attention(W_1^QQ, W_1^K, W_1^VV)
\\\\
head_2 = Attention(W_2^QQ, W_2^K, W_2^VV)
\\\\
head_3 = Attention(W_3^QQ, W_3^K, W_3^VV)
\\\\
head_4 = Attention(W_4^QQ, W_4^K, W_4^VV)
$$
最后4 次 attention 结果连接起来,使用另外一个大的线性变换:
$$
Multihead(Q, K, V) = W^O[head_1, head_2, head_3, head_4]
$$
## 参考
https://www.adityaagrawal.net/blog/deep_learning/attention