๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

ML/Papers

Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter ๋ฆฌ๋ทฐ

Miss classification ์— ๋Œ€ํ•œ ์›์ธ์„ ์„ธ๋ถ„ํ™”ํ•˜์—ฌ ๋ถ„์„ํ•œ ๋…ผ๋ฌธ

Abstract

  1. our analysis of the language in the typical datasets shows that hate speech lacks unique, discriminative features and therefore is found in the ‘long tail’ in a dataset that is difficult to discover.
  2. We then propose Deep Neural Network structures serving as feature extractors that are particularly effective for capturing the semantics of hate speech.
  3. are shown to be able to outperform the best performing method by up to 5 percentage points in macro-average F1, or 8 percentage points in the more challenging case of identifying hateful content.

5.4 Error Analysis

we manually analysed a sample of 200 tweets covering all classes to identify ones that are incorrectly classi๏ฌed by all methods.

We generally split these errors into four categories.

⇒ hate์ธ๋ฐ non-hate ์œผ๋กœ ํŒ๋‹จํ•œ ๊ฒฝ์šฐ์™€

⇒non-hate์ธ๋ฐ hate ์œผ๋กœ ํŒ๋‹จํ•œ ๊ฒฝ์šฐ๋ฅผ ๋ชจ๋‘ ํฌํ•จ

(๋ญ๋“  ๋ณธ๋ž˜ class๊ฐ€ ์•„๋‹Œ ๋‹ค๋ฅธclass๋กœ ๋ถ„๋ฅ˜๋œ ๊ฑด ๋‹ค ๊ณ ๋ ค)

1. Implicitness (46%) ์•”์‹œ์„ฑ - ๋ฌธ์žฅ์— ์•”์‹œ์ ์ธ ๋ฉด์ด ์žˆ๋Š”(ex์‚ฌํšŒ๋ฌธํ™”์ •๋ณด์•Œ์•„์•ผํ•ด์„๊ฐ€๋Šฅํ•œ?)๊ฒฝ์šฐ

  • the largest group of errors
  • tweet does not contain explicit lexical or syntactic patterns as useful classi๏ฌcation features.
  • requires complicated reasoning and cultural and social background knowledge.(์‚ฌํšŒ๋ฌธํ™”์ ์ธ ์˜๋ฏธ๋ฅผ ๋‹ด์€ hate expression)
  • false negative ์˜ˆ์‹œ( ์ •๋‹ต์ด hate ์ธ๋ฐ non-hate์ด๋ผ๊ณ  ๋ถ„๋ฅ˜ํ•œ ๊ฒฝ์šฐ)
  • ex) .. these same girls ... didn’t cook that well and aren’t very nice = ์ด ์—ฌ์ž๋“ค์€ ์š”๋ฆฌ๋ฅผ ๋ชปํ•˜๊ณ  ์นœ์ ˆํ•˜์ง€ ์•Š๋‹ค.
  • ex) expecting gender equality is the same as genocide = ์„ฑํ‰๋“ฑ์„ ๋ฐ”๋ผ๋Š” ๊ฒƒ์€ ๋Œ€๋Ÿ‰ํ•™์‚ด์ด๋‹ค.

2. Non-discriminative features (24%) hateํ‚ค์›Œ๋“œ๊ฐ€ ํฌํ•จ๋˜์–ด์„œ ๋ฌด์กฐ๊ฑด hate๋ผ๊ณ  ์ž˜๋ชปํŒ๋‹จํ•˜๋Š” ๊ฒฝ์šฐ

  • classi๏ฌers were confused by certain features that are frequent, seemingly indicative of hate speech but in fact, can be found in both hate and non-hate tweets = hate speech ์ง€ํ‘œ๊ฐ€ ๋˜๋Š” ๋‹จ์–ด๋“ค์ด ๋งŽ์ด ์‚ฌ์šฉ๋˜์ง€๋งŒ ์‹ค์€ ์•„๋‹ ์ˆ˜๋„ ์žˆ๋Š” ๋ฌธ์žฅ๋“ค
  • white trash = trash ๋ผ๋Š” ๋‹จ์–ด๊ฐ€ hate์˜ ๊ฐ•ํ•œ ์ง€ํ‘œ → ์•„๋ž˜ ์˜ˆ์‹œ๋“ค์„ ๋ชจ๋‘ hate ์œผ๋กœ ํŒ๋‹จํ•˜๋Š” ๊ฒฝํ–ฅ์žˆ์Œ
  • ex1 ) White bus drivers are all white trash... [hate]
  • ex2) ... I’m a piece of white trash I say it proudly [non-hate]

3. contextual information (18%) - ์ถ”๊ฐ€์ •๋ณด(url: ์‚ฌ์ง„, ์˜์ƒ,์›น์‚ฌ์ดํŠธ,๋‹ค๋ฅธ ํŠธ์œ— ๋“ฑ)๋ฅผ ํ•จ๊ป˜๋ณด๊ณ  ํ•ด์„ํ•ด์•ผํ•˜๋Š” ํŠธ์œ—์ธ ๊ฒฝ์šฐ

  • URL the language itself does not imply hate. However, when it is combined with the video content referenced by the link, the tweet incites hatred towards particular religious groups.
  • ex)what they tell you is their intention is not their intention. https://t.co/8cmfoOZwxz
  • The content referenced by URLs can be videos, images, websites, or even other tweets.

4. disputable annotations (12%) - ์• ์ดˆ์— annotator๊ฐ€ ์™œ ๋•Œ๋ฌธ์— hate์œผ๋กœ ๋ถ„๋ฅ˜ํ•œ๊ฑด์ง€ ํ•ด์„์กฐ์ฐจ ์•ˆ๋˜๋Š” ๊ฒƒ์ธ ๊ฒฝ์šฐ

  • ์™œ hate ์ธ์ง€ ๋ชจ๋ฅด๊ฒ ์Œ ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜๋ถ€ํ„ฐ ์ž˜๋ชป๋œ๊ฒƒ๋“ค?
  • [Sexism] ex) He got one serve, not two. Had to defend the doubles lines also
  • [Sexism] ex)@XXX Picwhatting? And you have quoted none of the tweets. What are you trying to say ...?’ is questioning a point raised in another tweet which we consider as sexism, but this tweet itself has been annotated as sexism.

๋…ผ๋ฌธ ์ถœ์ฒ˜

 

Zhang, Ziqi, and Lei Luo. "Hate speech detection: A solved problem? the challenging case of long tail on twitter." Semantic Web 10.5 (2019): 925-945.