[분석 방법] HAN(Hierarchical Attention Network) 이란?

Hate speech detection 을 하기 위해서 어떤 딥러닝 알고리즘을 사용할까 고민하던 중, HAN 알고리즘에 대해 알게 되었습니다.

HAN(Hierarchical Attention Network) 알고리즘은 문서 분류에 특화되어있는 딥러닝 알고리즘입니다.

HAN의 특징

위 그림은 HAN 알고리즘의 구조입니다.

1) 문서의 계층적 구조를 반영

문서는 문장들로, 문장은 단어들로 구성되어 있는데, 이러한 계층적구조를 분류하는데 적합한 알고리즘을 가지고 있습니다.

2) Attention mechanism

중요한 단어와 문장에 가중치를 더해줄 수 있습니다.

이 두 가지 특징으로, document classification의 성능을 높여줍니다.

구조를 좀 더 파헤쳐 봅시다.

L : document 가 가지고 있는 문장 개수

si : document 가 가지고 있는 문장 하나하나

Ti : 각 문장은 Ti개 단어를 가지고 있음

wit with t ∈ [1, T] : i번째 문장의 단어들 을 의미

Word Encoder

1) 단어 임베딩

embed the words to vectors through an embedding matrix

bidirectional GRU 사용 : to get annotations of words by summarizing information from both directions for words 단어에 대한 양방향 정보를 요약하여 단어 annotation을 얻기위해 사용, annotation에 문맥 정보를 통합합니다.

forward : sentence si from wi1 to wiT : 문장 속 단어들을 정방향으로 읽음

backward : wiT to wi1 : 문장 속 단어들을 역방향으로 읽음

위 작업(forward hidden state와 backward hidden state를 병합하는 과정) 을 통해서 주어진 단어 wit에 대해 annotation 을 얻는다.

wit을 중심으로 한 전체 문장의 정보를 요약합니다.

Word Attention

모든 단어가 문장에서 동등하게 기여하진 않는다
따라서 문장의 의미에 중요한 단어를 추출하고 그 정보 단어의 representation을 집계하여 문장 벡터를 형성하는 어텐션 메커니즘을 도입합니다.

(5) uit(hidden representation of hit) 를 얻기 위해 hit(word annotation) 를 one-layer MLP를 에 준다.

(6) 단어의 중요도를 측정 : uw(word level context vector) 와 uit의 유사성으로 측정

→ softmax function을 통해 ait (normalized 된 importance weight) 를 얻음

(7) si (sentence vector) 를 계산 : 가중치를 기반으로 한 단어 annotation의 가중치 합계(weighted sum)으로.

uit의 역할은?
hit 는 forward, backward 결합하여 생긴(=GRU) hidden state
→ 얘를 one-layer MLP에 넣어주면 uit가 생김
→ uit는 hit의 hidden representation 인 것

Sentence Encoder

si(sentence vector) 가 주어짐
비슷한 방식으로 document vector를 얻을 수 있음 ⇒ bidirectional GRU

i 번째 sentence 의 annotation : forward hi와 backward hj를 통합하여 얻을 수 있음

hi는 문장 i 주위의 이웃 문장을 요약하지만 여전히 문장 i에 집중합니다.

Sentence Attention

문서를 올바르게 분류하는 문장에 대한 보상 : attention mechanism

(9) 문장의 중요도를 측정 : us(setence level context vector) 와 ui의 유사성으로 측정

→ softmax function을 통해 ai (normalized 된 importance weight) 를 얻음

(10) v ( document vector) 계산 : 문서에 있는 문장들의 모든 정보를 요약

문장 수준의 컨텍스트 벡터는 훈련 과정에서 무작위로 초기화(randomly initialized)되고 공동 학습 될 수 있습니다.
→ (매우 중요) 위에서 나왔던 uw(word level context vector), us(setence level context vector) 는 randomly initialized 된 벡터
→ 사전에 생성된 vector 들과는 무관한 랜덤벡터

Document classification

v (document vector) : high level representation of the document

(11) document classification의 features로 사용가능

(12) training loss로는 올바른 레이블의 음의 로그 가능성을 사용

j (문서 d의 label)

Reference

Yang, Zichao, et al. "Hierarchical attention networks for document classification." Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. 2016.

'Projects > Hate Speech Detection' 카테고리의 다른 글

[텔레그램 챗봇] 파이썬 텔레그램 챗봇 만들기 (0)	2021.07.16
[데이터셋] Hate speech dataset (0)	2019.12.23
[아이디어 소개] Hate speech detection (0)	2019.12.23

Inistory's devlog 💻

[분석 방법] HAN(Hierarchical Attention Network) 이란?

HAN의 특징

Word Encoder

Word Attention

Sentence Encoder

Sentence Attention

Document classification

Reference

'Projects > Hate Speech Detection' 카테고리의 다른 글

티스토리툴바

[분석 방법] HAN(Hierarchical Attention Network) 이란?

HAN의 특징

Word Encoder

Word Attention

Sentence Encoder

Sentence Attention

Document classification

Reference

'Projects > Hate Speech Detection' 카테고리의 다른 글

'Projects/Hate Speech Detection' 관련글

티스토리툴바