๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Projects/Hate Speech Detection

[๋ฐ์ดํ„ฐ์…‹] Hate speech dataset

๋ฐ์ดํ„ฐ์…‹ ์ค€๋น„

 

Hatespeech Dataset ์„ ๋จผ์ € ๋‹ค์šด๋ฐ›์•„์•ผํ•œ๋‹ค.

Dataset์€ ์ด๊ณณ์„ ์ฐธ๊ณ ํ–ˆ๋‹ค.

 

http://hatespeechdata.com/

 

Hate Speech Datasets

Hate speech data

hatespeechdata.com

 

์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ์…‹ ์„ ์ •

 

์˜์–ด๋กœ๋˜๊ณ , ํŠธ์œ„ํ„ฐ์—์„œ ์ถ”์ถœํ•œ ๋ฐ์ดํ„ฐ์…‹๋งŒ์„ ์ถ”๋ ค๋ณด๋‹ˆ, 

Hatebase, Kaggle, Wassem & Hovy ๋“ฑ์ด ์žˆ๋‹ค.

์ด๋“ค ์ค‘, Hate speech detection ๋…ผ๋ฌธ์—์„œ ๊ฐ€์žฅ ๋งŽ์ด ๋ณด์˜€๋˜ Wasseem & Hovy ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜๊ธฐ๋กœ ์„ ํƒํ•œ๋‹ค.

 

 

Class

Size

Origin Source

Language

Hatebase

Hate

Offensive

Neither

24,000

Twitter

English

Kaggle

Insulting

Not insulting

6,000

Twitter

English

Wassem & Hovy

Sexism

Racism

None

16,000

Twitter

English