๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

ML/NLP

์‹œํ€€์Šค ํˆฌ ์‹œํ€€์Šค + ์–ดํ…์…˜ ๋ชจ๋ธ

0. word to word translation

  • ๋ฒˆ์—ญ์˜ ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์€? ๊ฐ๊ฐ์˜ ๋‹จ์–ด๋ฅผ ๋ฒˆ์—ญํ•˜๋Š” ๊ฒƒ
  • ๋ฌธ์ œ1) English and Korean has differnet word order
    • I love you →๋ฒˆ์—ญ → ๋‚œ ์‚ฌ๋ž‘ํ•ด ๋„ (์–ด์ƒ‰)
  • ๋ฌธ์ œ2) output always have same word count with input, while it should not!
    • How are you (3 words) → ์ž˜ ์ง€๋‚ด (2words)

1. RNN

  • Context vector: I love you ๋ผ๋Š” information์„ ํ•จ์ถ•ํ•˜๊ณ  ์žˆ์Œ

2. Seqence to Seqence

  • ๋ฌธ๋งฅ ๋ฒกํ„ฐ(Context vector) ๋กœ๋ถ€ํ„ฐ ๋ฒˆ์—ญ์„ ์‹œ์ž‘
  • ๊ฐ€ ๋‚˜์˜ฌ ๋•Œ๊นŒ์ง€ ๋ฒˆ์—ญ์„ ์ด์–ด๊ฐ
  • ์ด๋ ‡๊ฒŒํ•˜๋ฉด ๋‹จ์–ด๊ฐ€ ์„ธ๊ฐœ๋“ค์–ด์™”๋Š”๋ฐ, ๋‘๊ฐœ๋งŒ์— ๋๋‚  ์ˆ˜๋„ ์žˆ๊ฒ ๊ณ , ํ•™์Šต๊ณผ์ •์—์„œ S V O → S O V ๋„ ๊ฐ€๋Šฅ
  • ์ด๋Ÿฐ๊ฑธ ์ธ์ฝ”๋”, ๋””์ฝ”๋” ์•„ํ‚คํ…์ณ ๋˜๋Š” ์‹œํ€€์Šคํˆฌ์Šคํ€€์Šค ๋ชจ๋ธ์ด๋ผ๊ณ  ํ•จ
  • Encoder์˜ ์—ญํ• : ๊ฐ ๋‹จ์–ด๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ๋ฐ›์Œ์œผ๋กœ์จ ์ตœ์ข…์ ์œผ๋กœ context vector๋ฅผ ๋งŒ๋“ฆ
  • Decoder์˜ ์—ญํ• : ๋ฌธ๋งฅ ๋ฒกํ„ฐ๋ฅผ ๋ฐ›์•„์„œ Start ๋ถ€ํ„ฐ End ๊นŒ์ง€ ๋ฐ›์•„์„œ ๊ทธ ์•ˆ์— ์žˆ๋Š” ๋‹จ์–ด๋“ค์˜ ์ˆœ์„œ๊ธฐ๋ฐ˜ ๊ธฐ๊ณ„๋ฒˆ์—ญ์„ ์‹œ์ž‘
  • ๋ฌธ์ œ: ๋‹จ์–ด์˜ ์‚ฌ์ด์ฆˆ๊ฐ€ ์ ์„ ๊ฒฝ์šฐ๋Š” ๋ฌธ์ œ๊ฐ€ ์—†๋Š”๋ฐ, ๋‹จ์–ด์˜ ์‚ฌ์ด์ฆˆ๊ฐ€ ๋งŽ์„ ๋•Œ ๋ฌธ์ œ๊ฐ€ ์ƒ๊น€
    • ์™œ๋ƒํ•˜๋ฉด ๋ฌธ๋งฅ ๋ฒกํ„ฐ๊ฐ€ ํ•˜๋‚˜์˜ ๊ณ ์ •๋œ ์‚ฌ์ด์ฆˆ์˜ ๋ฒกํ„ฐ์ด๊ธฐ ๋•Œ๋ฌธ
    • ์ปจํ…์ŠคํŠธ ๋ฒกํ„ฐ ์‚ฌ์ด์ฆˆ๊ฐ€ ์ถฉ๋ถ„ํžˆ ํฌ์ง€ ์•Š๋‹ค๋ฉด ๋ชจ๋“  ์ •๋ณด๋ฅผ ํ•จ์ถ•ํ•˜๊ธฐ์—๋Š” ์‚ฌ์ด์ฆˆ๊ฐ€ ์ž‘๋‹ค
    • ์–ด๋–ป๊ฒŒ ํ•ด๊ฒฐ? attention mechanism ํ™œ์šฉ

3. ์ธ์ฝ”๋” ๋””์ฝ”๋” , ์–ดํ…์…˜ ๋น„๊ต

1) ์ธ์ฝ”๋” ๋””์ฝ”๋”

  • ์ธ์ฝ”๋” ๋””์ฝ”๋” ์•„ํ‚คํ…์ณ์—์„œ๋Š” ์ธ์ฝ”๋”์—์„œ ๋‚˜์™”๋˜ ๋ชจ๋“  state๋“ค์„ ํ™œ์šฉํ•˜์ง€ ์•Š๊ณ 
  • ๋‹จ์ˆœํžˆ ๋งˆ์ง€๋ง‰์— ๋‚˜์˜จ ๋ฒกํ„ฐ๋ฅผ context vector๋ผ๊ณ  ๋ถˆ๋ €๊ณ 
  • ํ•˜๋‚˜์˜ context vector์—์„œ translation์ด ์ด๋ฃจ์–ด์ง

2) ์–ดํ…์…˜

  • (ํŠน์ง•1) Encode info into sequence of vectors not in a single context vector
    • ์ธ์ฝ”๋”์—์„œ ๋‚˜์˜จ ๊ฐ๊ฐ์˜, ๋ชจ๋“  Rnn Cell์˜ state๋ฅผ ํ™œ์šฉํ•˜์ž๋Š” ๊ฒƒ
    • ์ด state๋ฅผ ํ™œ์šฉํ•˜์—ฌ dinamic ํ•˜๊ฒŒ ๊ฐ๊ฐ์˜ state๋ณ„๋กœ cotext vector๋ฅผ ๋งŒ๋“ค์–ด๊ฐ€์ง€๊ณ  ๋ฒˆ์—ญ์„ ํ•˜๋ฉด, ๊ณ ์ •๋œ ์‚ฌ์ด์ฆˆ์˜ context vector๋ฅผ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Œ
    • ๋”์ด์ƒ ํ•˜๋‚˜์˜ ๊ณ ์ •๋œ ์‚ฌ์ด์ฆˆ์˜ context vector๊ฐ€ ์•„๋‹˜
    • ๊ฐ๊ฐ์˜ state๋ณ„๋กœ context vector๋ฅผ ์ƒˆ๋กญ๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ
  • (ํŠน์ง•2) Chooses a subset of these vectors adaptively while decoding the translation
    • ์ธ์ฝ”๋”์— ์žˆ๋˜ ๋ชจ๋“  state ๋“ค ์†์—์„œ ์šฐ๋ฆฌ๊ฐ€ ์ง‘์ค‘ํ•ด์•ผ๋  ๋‹จ์–ด๋“ค์—๊ฒŒ๋งŒ ์ง‘์ค‘ํ•  ์ˆ˜ ์žˆ๋Š”mechanism์„ ๋”ฐ๋กœ ์„ค๊ณ„ํ•  ์ˆ˜ ์žˆ์Œ

3) I love you ๋ฅผ ๋‚œ ๋„์‚ฌ๋ž‘ํ•ด๋กœ ๋ฒˆ์—ญํ•˜๋Š” ๊ณผ์ •

  • FC : ์ธ์ฝ”๋” part์—์„œ ๋‚˜์™”๋˜ ๋ชจ๋“  rnn cell์˜ state ๋“ค์„ ํ™œ์šฉํ•จ
  • FC(h3) : ์ตœ์ข…์œผ๋กœ ๋‚˜์™”๋˜ h3๋„ ๋„ฃ์Œ, ์•„์ง ๋””์ฝ”๋”์—์„œ ๋‚˜์˜จ state ๊ฐ’์ด ์—†๊ธฐ ๋•Œ๋ฌธ
  • Softmax : ๊ฐ ์ธ์ฝ”๋”์— ์žˆ๋˜ rnn cell์˜ state ๋“ค์˜ score๋“ค์„ ์ƒ์„ฑ ⇒ Attention weight
  • cv1(์ฒซ๋ฒˆ์งธ context vector) : rnn cell์˜ ๊ฐ state * attention weight ๋“ค์˜ ํ•ฉ
  • cv1์„ ์™€ ํ•จ๊ป˜ ๋””์ฝ”๋” part์˜ ์ฒซ๋ฒˆ์งธ rnn cell์— ๋„ฃ์–ด์คŒ
  • ouput : ‘๋‚œ’
  • dh1 ์ƒ์„ฑ

  • dh1 : ๋””์ฝ”๋”์˜ state ๊ฐ’ dh1์ด FC layer์— ๋“ค์–ด๊ฐ ⇒ FC(dh1)
  • cv2: cv1์™€ ๋‹ค๋ฅธ context vector
  • cv2๋ฅผ ‘๋‚œ’ ๊ณผ ํ•จ๊ป˜ ๋””์ฝ”๋”์˜ ๋‘๋ฒˆ์งธ rnn cell์— ๋„ฃ์–ด์คŒ
  • output : ‘๋„’
  • dh2 ์ƒ์„ฑ

  • dh2: ๋””์ฝ”๋”์˜ state๊ฐ’ dh2๋ฅผ FC layer์— ๋„ฃ์Œ ⇒ FC(dh2)
  • cv3 ์ƒ์„ฑ
  • cv3๋ฅผ ‘๋„’ ๊ณผ ํ•จ๊ป˜ ๋””์ฝ”๋”์˜ ์„ธ๋ฒˆ์งธ rnn cell์— ๋„ฃ์–ด์คŒ
  • output: ‘์‚ฌ๋ž‘ํ•ด’
  • dh3 ์ƒ์„ฑ

  • dh3: ๋””์ฝ”๋”์˜ state๊ฐ’ dh3๋ฅผ FC layer์— ๋„ฃ์Œ ⇒ FC(dh3)
  • cv4 ์ƒ์„ฑ
  • ouput :
  • ๋ฒˆ์—ญ ๋

key point

  • attention weight๋Š” ์ธ์ฝ”๋”์—์„œ ๋‚˜์˜จ State์—์„œ ์–ด๋””๋ฅผ focusํ•ด์„œ ๋ณผ ๊ฒƒ์ธ์ง€ ๋ณธ๋‹ค

์š”์•ฝ

  • ์‹œํ€€์Šคํˆฌ์‹œํ€€์Šค๋Š” ์ธ์ฝ”๋”, ๋””์ฝ”๋” ์•„ํ‚คํ…์ณ์ด๋‹ค
  • ์‹œํ€€์Šคํˆฌ์‹œํ€€์Šค๋Š” ์ธ์ฝ”๋”์—์„œ ๋‚˜์™”๋˜ ๋ชจ๋“  state๋“ค์„ ํ™œ์šฉํ•˜์ง€ ์•Š๊ณ , ๋‹จ์ˆœํžˆ ๋งˆ์ง€๋ง‰์— ๋‚˜์˜จ ๋ฒกํ„ฐ์ธ context vector ๋งŒ์„ ์‚ฌ์šฉํ•ด์„œ ๋ฒˆ์—ญ์„ ํ•œ๋‹ค.
  • ์ด๋Ÿฌ๋ฉด ๋‹จ์–ด์˜ ์‚ฌ์ด์ฆˆ๊ฐ€ ์ปค์งˆ ๊ฒฝ์šฐ, ๋ชจ๋“  ๋‹จ์–ด์˜ ์ •๋ณด๋ฅผ ํ•จ์ถ•ํ•˜๊ธฐ์—๋Š” ์‚ฌ์ด์ฆˆ๊ฐ€ ์ž‘๋‹ค.
  • ์ด๋ฅผ ํ•ด๊ฒฐํ•œ๊ฒŒ Attention Mechanism์ด๋‹ค.
  • Attention Mechanism์€ ๊ฐ๊ฐ์˜ rnn cell์˜ state๋ณ„๋กœ context vector๋ฅผ ์ƒˆ๋กญ๊ฒŒ ๋งŒ๋“ค์–ด ํ™œ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฒˆ์—ญ์— ๊ฐ๊ฐ์˜ state๋“ค์„ ๋ชจ๋‘ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

References

'ML > NLP' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

ํŠธ๋žœ์Šคํฌ๋จธ (Attention is all you need)  (3) 2022.01.19
Attention Mechanism์ด๋ž€?  (0) 2021.12.26