Ethical and technical challenges of AI in tackling hate speech
DOI:
https://doi.org/10.29173/irie416Keywords:
Artificial Intelligence, Ethics, Online Harms, Hate Speech, BiasAbstract
In this paper, we discuss some of the ethical and technical challenges of using Artificial Intelligence for online content moderation. As a case study, we used an AI model developed to detect hate speech on social networks, a concept for which varying definitions are given in the scientific literature and consensus is lacking. We argue that while AI can play a central role in dealing with information overload on social media, it could cause risks of violating freedom of expression (if the project is not well conducted). We present some ethical and technical challenges involved in the entire pipeline of an AI project - from data collection to model evaluation - that hinder the large-scale use of hate speech detection algorithms. Finally, we argue that AI can assist with the detection of hate speech in social media, provided that the final judgment about the content has to be made through a process with human involvement.
References
Angwin, J., Larson, J., Mattu, S., & Kirchner, L. Machine Bias. ProPublica, 2016. https://doi.org/http://dx.doi.org/10.1108/17506200710779521
Barker, K. and Jurasz, O. Online Harms White Paper Consultation Response. Striling Law School & The Open University Law School, 2019.
Beadle, S. How does the Internet facilitate radicalization? London, England: War Studies Department, King’s College, 2017.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., & Amodei, D. Language Models are Few-Shot Learners, 2020. http://arxiv.org/abs/2005.14165
Cortiz, D. O Design pode ajudar na construção de Inteligência Artificial humanística?, p. 14-22 . In: 17º Congresso Internacional de Ergonomia e Usabilidade de Interfaces Humano-Tecnologia e o 17 º Congresso Internacional de Ergonomia e Usabilidade de Interfaces e Interação Humano-Computador. São Paulo: Blucher, 2019.
ISSN 2318-6968, DOI 10.5151/ergodesign2019-1.02
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018. http://arxiv.org/abs/1810.04805
Ellison, N. B., & Boyd, D. M. Sociality Through Social Network Sites (W. H. Dutton (ed.); Vol. 1). Oxford University Press, 2013. https://doi.org/10.1093/oxfordhb/9780199589074.013.0008
Facebook (2020). Community Standards;. Available from: https://www.facebook.com/communitystandards/ objectionable content.
Fleiss, J. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378, 1971.
Fortuna, P., Rocha da Silva, J., Soler-Company, J., Wanner, L., & Nunes, S. A Hierarchically-Labeled Portuguese Hate Speech Dataset. Proceedings of the Third Workshop on Abusive Language Online, 94–104, 2019.https://doi.org/10.18653/v1/W19-3510
Harbinja, E., et al. "Online Harms White Paper: Consultation Response, BILETA Response to the UK Government Consultation'Online Harms White Paper', 2019.
Iginio Gagliardone, Danit Gal, Thiago Alves, and Gabriela Martinez. Countering online Hate Speech. UNESCO, 2015.
Kokhlikyan, N., Miglani, V., Martin, M., Wang, E., Alsallakh, B., Reynolds, J., Melnikov, A., Kliushkina, N., Araya, C., Yan, S., & Reblitz-Richardson, O. Captum: A unified and generic model interpretability library for PyTorch, 2020. http://arxiv.org/abs/2009.07896
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach, 2019. http://arxiv.org/abs/1907.11692
Mozafari, M., Farahbakhsh, R., & Crespi, N. Hate speech detection and racial bias mitigation in social media based on BERT model. PLOS ONE, 15(8), e0237861, 2020. https://doi.org/10.1371/journal.pone.0237861
MIT Technology Review. 10 Breakthrough Technologies, 2020. Available from: https://www.technologyreview.com/10-breakthrough-technologies/2020/
Nash, V. Revise and resubmit? Reviewing the 2019 Online Harms White Paper. Journal of Media Law, 11(1), 18–27, 2019. https://doi.org/10.1080/17577632.2019.1666475
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science, Vol 366 (6464), 447–453, 2019. https://doi.org/10.1126/science.aax2342
Pari, C; Nunes, G; Gomes, J. Avaliação de técnicas de word embedding na tarefa de detecção de discurso de ódio. In: Encontro Nacional De Inteligência Artificial E Computacional (ENIAC), 16 ,2019, Salvador. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, p. 1020-103, 2019. DOI: https://doi.org/10.5753/eniac.2019.9354.
Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. The Risk of Racial Bias in Hate Speech Detection. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 1668–1678, 2019. https://doi.org/10.18653/v1/P19-1163
Suler, J. The Online Disinhibition Effect. CyberPsychology & Behavior, 7(3), 321–326, 2004. https://doi.org/10.1089/1094931041291295
Sun C., Qiu X., Xu Y., Huang X. How to Fine-Tune BERT for Text Classification?. In: Sun M., Huang X., Ji H., Liu Z., Liu Y. (eds) Chinese Computational Linguistics. CCL 2019. Lecture Notes in Computer Science, vol 11856. Springer, 2019. Cham. https://doi.org/10.1007/978-3-030-32381-3_16
YouTube. Hate speech policy. 2020. Available from: https://support.google.com/youtube/answer/2801939?hl=en
Twitter. Hateful conduct policy, 2020. Available from: https://help.twitter.com/en/rules-and-policies/hateful-conductpolicy
Zuckerberg, Mark. Mark Zuckerberg Stands for Voice and Free Expression, 2019. Available from: https://about.fb.com/news/2019/10/mark-zuckerberg-stands-for-voice-and-free-expression/
Published
How to Cite
Issue
Section
License
Under the CC-BY 4.0 license, you have the right to:
Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Under the following terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.