最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會(huì)員登陸 & 注冊(cè)

《注意力矩陣乘法》Attention as matrix multiplication

2023-02-27 20:38 作者:學(xué)的很雜的一個(gè)人  | 我要投稿


來(lái)源:https://e2eml.school/transformers.html#softmax

中英雙語(yǔ)版,由各類翻譯程序和少量自己理解的意思做中文注釋


相關(guān)文章匯總在文集:Transformers from Scratch(中文注釋)

--------------------------------------------------------------------------------------------------------------------


Feature weights could be straightforward to build by counting how often each word pair/next word transition occurs in training, but attention masks are not.?

通過(guò)計(jì)算每個(gè)單詞對(duì)/下一個(gè)單詞轉(zhuǎn)換在訓(xùn)練中發(fā)生的頻率,可以很容易地建立特征權(quán)重,但注意力掩碼不是。

Up to this point, we've pulled the mask vector out of thin air.?

到目前為止,我們已經(jīng)憑空拉出了掩模矢量。

How transformers find the relevant mask matters.?

transformers是如何找到相關(guān)的掩碼。

It would be natural to use some sort of lookup table, but now we are focusing hard on expressing everything as matrix multiplications.?

使用某種查找表是很自然的,但現(xiàn)在我們專注于將所有內(nèi)容表示為矩陣乘法。

We can use the same?lookup?method we introduced above by stacking the mask vectors for every word into a matrix and using the one-hot representation of the most recent word to pull out the relevant mask.

我們可以使用與上面介紹的相同的查找方法,將每個(gè)單詞的掩碼向量堆疊到一個(gè)矩陣中,并使用最新單詞的獨(dú)熱表示來(lái)提取相關(guān)的掩碼。

In the matrix showing the collection of mask vectors, we've only shown the one we're trying to pull out, for clarity.

在顯示掩碼向量集合的矩陣中,為了清楚起見(jiàn),我們只顯示我們?cè)噲D提取的那個(gè)。

We're finally getting to the point where we can start tying into the paper.

我們終于到了可以開(kāi)始進(jìn)入到論文的地步。

This mask lookup is represented by the?QK^T?term in the attention equation.

這種掩碼查找由注意方程中的QK^T項(xiàng)表示。

The query?Q?represents the feature of interest and the matrix?K?represents the collection of masks.

查詢 Q 表示感興趣的特征,矩陣 K 表示掩碼的集合。

Because it's stored with masks in columns, rather than rows, it needs to be transposed (with the?T?operator) before multiplying.

因?yàn)樗怯醚诖a存儲(chǔ)在列中,而不是在行中,所以在乘法之前需要轉(zhuǎn)置(使用 T 運(yùn)算符)。

By the time we're all done, we'll make some important modifications to this, but at this level it captures the concept of a differentiable lookup table that transformers make use of.

當(dāng)我們完成所有操作時(shí),我們將對(duì)此進(jìn)行一些重要的修改,但在此級(jí)別,它捕獲了transformers使用的可微查找表的概念。

《注意力矩陣乘法》Attention as matrix multiplication的評(píng)論 (共 條)

分享到微博請(qǐng)遵守國(guó)家法律
聂荣县| 鸡西市| 长阳| 蛟河市| 资溪县| 宿松县| 海城市| 聊城市| 卓资县| 泸水县| 登封市| 永靖县| 钟祥市| 宜黄县| 三亚市| 镇江市| 大庆市| 呈贡县| 金门县| 阿勒泰市| 江门市| 宁晋县| 汉川市| 金堂县| 多伦县| 鄂托克前旗| 南岸区| 凤凰县| 开封县| 长葛市| 越西县| 余姚市| 蕉岭县| 台北县| 江都市| 五指山市| 崇州市| 汝阳县| 衡山县| 凌云县| 五指山市|