Systematic Offensive Stereotyping (SOS) Bias in Language Models

Elsafoury, Fatma

Computer Science > Computation and Language

arXiv:2308.10684 (cs)

[Submitted on 21 Aug 2023 (v1), last revised 26 Apr 2024 (this version, v2)]

Title:Systematic Offensive Stereotyping (SOS) Bias in Language Models

Authors:Fatma Elsafoury

View PDF HTML (experimental)

Abstract:In this paper, we propose a new metric to measure the SOS bias in language models (LMs). Then, we validate the SOS bias and investigate the effectiveness of removing it. Finally, we investigate the impact of the SOS bias in LMs on their performance and fairness on hate speech detection. Our results suggest that all the inspected LMs are SOS biased. And that the SOS bias is reflective of the online hate experienced by marginalized identities. The results indicate that using debias methods from the literature worsens the SOS bias in LMs for some sensitive attributes and improves it for others. Finally, Our results suggest that the SOS bias in the inspected LMs has an impact on their fairness of hate speech detection. However, there is no strong evidence that the SOS bias has an impact on the performance of hate speech detection.

Comments:	Keywords: Systematic offensive stereotyping (SOS) bias, Language models, bias removal, fairness, hate speech detection
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2308.10684 [cs.CL]
	(or arXiv:2308.10684v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2308.10684

Submission history

From: Fatma Elsafoury [view email]
[v1] Mon, 21 Aug 2023 12:37:42 UTC (339 KB)
[v2] Fri, 26 Apr 2024 08:45:35 UTC (530 KB)

Computer Science > Computation and Language

Title:Systematic Offensive Stereotyping (SOS) Bias in Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Systematic Offensive Stereotyping (SOS) Bias in Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators