Miss classification ์ ๋ํ ์์ธ์ ์ธ๋ถํํ์ฌ ๋ถ์ํ ๋ ผ๋ฌธ
Abstract
- our analysis of the language in the typical datasets shows that hate speech lacks unique, discriminative features and therefore is found in the ‘long tail’ in a dataset that is difficult to discover.
- We then propose Deep Neural Network structures serving as feature extractors that are particularly effective for capturing the semantics of hate speech.
- are shown to be able to outperform the best performing method by up to 5 percentage points in macro-average F1, or 8 percentage points in the more challenging case of identifying hateful content.
5.4 Error Analysis
we manually analysed a sample of 200 tweets covering all classes to identify ones that are incorrectly classi๏ฌed by all methods.
We generally split these errors into four categories.
⇒ hate์ธ๋ฐ non-hate ์ผ๋ก ํ๋จํ ๊ฒฝ์ฐ์
⇒non-hate์ธ๋ฐ hate ์ผ๋ก ํ๋จํ ๊ฒฝ์ฐ๋ฅผ ๋ชจ๋ ํฌํจ
(๋ญ๋ ๋ณธ๋ class๊ฐ ์๋ ๋ค๋ฅธclass๋ก ๋ถ๋ฅ๋ ๊ฑด ๋ค ๊ณ ๋ ค)
1. Implicitness (46%) ์์์ฑ - ๋ฌธ์ฅ์ ์์์ ์ธ ๋ฉด์ด ์๋(ex์ฌํ๋ฌธํ์ ๋ณด์์์ผํด์๊ฐ๋ฅํ?)๊ฒฝ์ฐ
- the largest group of errors
- tweet does not contain explicit lexical or syntactic patterns as useful classi๏ฌcation features.
- requires complicated reasoning and cultural and social background knowledge.(์ฌํ๋ฌธํ์ ์ธ ์๋ฏธ๋ฅผ ๋ด์ hate expression)
- false negative ์์( ์ ๋ต์ด hate ์ธ๋ฐ non-hate์ด๋ผ๊ณ ๋ถ๋ฅํ ๊ฒฝ์ฐ)
- ex) .. these same girls ... didn’t cook that well and aren’t very nice = ์ด ์ฌ์๋ค์ ์๋ฆฌ๋ฅผ ๋ชปํ๊ณ ์น์ ํ์ง ์๋ค.
- ex) expecting gender equality is the same as genocide = ์ฑํ๋ฑ์ ๋ฐ๋ผ๋ ๊ฒ์ ๋๋ํ์ด์ด๋ค.
2. Non-discriminative features (24%) hateํค์๋๊ฐ ํฌํจ๋์ด์ ๋ฌด์กฐ๊ฑด hate๋ผ๊ณ ์๋ชปํ๋จํ๋ ๊ฒฝ์ฐ
- classi๏ฌers were confused by certain features that are frequent, seemingly indicative of hate speech but in fact, can be found in both hate and non-hate tweets = hate speech ์งํ๊ฐ ๋๋ ๋จ์ด๋ค์ด ๋ง์ด ์ฌ์ฉ๋์ง๋ง ์ค์ ์๋ ์๋ ์๋ ๋ฌธ์ฅ๋ค
- white trash = trash ๋ผ๋ ๋จ์ด๊ฐ hate์ ๊ฐํ ์งํ → ์๋ ์์๋ค์ ๋ชจ๋ hate ์ผ๋ก ํ๋จํ๋ ๊ฒฝํฅ์์
- ex1 ) White bus drivers are all white trash... [hate]
- ex2) ... I’m a piece of white trash I say it proudly [non-hate]
3. contextual information (18%) - ์ถ๊ฐ์ ๋ณด(url: ์ฌ์ง, ์์,์น์ฌ์ดํธ,๋ค๋ฅธ ํธ์ ๋ฑ)๋ฅผ ํจ๊ป๋ณด๊ณ ํด์ํด์ผํ๋ ํธ์์ธ ๊ฒฝ์ฐ
- URL the language itself does not imply hate. However, when it is combined with the video content referenced by the link, the tweet incites hatred towards particular religious groups.
- ex)what they tell you is their intention is not their intention. https://t.co/8cmfoOZwxz
- The content referenced by URLs can be videos, images, websites, or even other tweets.
4. disputable annotations (12%) - ์ ์ด์ annotator๊ฐ ์ ๋๋ฌธ์ hate์ผ๋ก ๋ถ๋ฅํ๊ฑด์ง ํด์์กฐ์ฐจ ์๋๋ ๊ฒ์ธ ๊ฒฝ์ฐ
- ์ hate ์ธ์ง ๋ชจ๋ฅด๊ฒ ์ ๋ฐ์ดํฐ ๋ถ๋ฅ๋ถํฐ ์๋ชป๋๊ฒ๋ค?
- [Sexism] ex) He got one serve, not two. Had to defend the doubles lines also
- [Sexism] ex)@XXX Picwhatting? And you have quoted none of the tweets. What are you trying to say ...?’ is questioning a point raised in another tweet which we consider as sexism, but this tweet itself has been annotated as sexism.
๋ ผ๋ฌธ ์ถ์ฒ
Zhang, Ziqi, and Lei Luo. "Hate speech detection: A solved problem? the challenging case of long tail on twitter." Semantic Web 10.5 (2019): 925-945.