Skip to Main Content

Assessing Bias in Large Language Models

Invisible text for formatting

Assessing Bias in Large Language Models

Introduction

By Ronald E. Bulanda & Jennifer Roebuck Bulanda

AI technologies, including large language models (LLMs) like ChatGPT, continue to develop and permeate the academic landscape at an accelerated pace. As student use of AI for coursework increases, it is especially important for educators to help train students to work with these technologies. Specifically, it is necessary to better understand the relationship between the designs of these systems, the output they generate, and how users interpret that output. Although calls for such assessments may highlight the specific relevance to fields such as media studies, geography, and law (see Liu, 2021), AI literacy is important for all faculty and students in higher education to best evaluate the text it produces. It equips all learners to be more critical consumers of information, and also facilitates the design and employment of coursework that accounts for (and perhaps minimizes) bias.  

The first step in establishing AI literacy is to understand how LLMs operate and to be aware of their fallibility. Large language models (LLMs) such as ChatGPT are trained with existing, human-generated writing to use statistical modeling techniques to predict what text will follow any given phrasing. The statistical modeling driving the textual output can result in falsehoods, inaccuracies, and what are often termed AI “hallucinations.” However, this use of existing writing to generate new writing also raises concerns about the ways LLMs may perpetuate -- without questioning or challenging -- existing social structures and systems of stratification. To this end, generative AI may promulgate inequalities, reinforce bias, spread misinformation, and/or contribute to the increasing polarization of society.

Emerging research has started to identify pathways by which biases may become embedded in the output generated by AI systems. Knowledge of these pathways is important for our use of these systems because it promotes more critical consumption of the output they generate. Three specific pathways that are likely to promote biases in AI output include its training data, its algorithms, and human intervention efforts aimed at (re)training the systems, including those meant to reduce bias (Ferrara, 2023; Smith & Rustagi, 2020).

Why Might AI be Biased?

The Training Problem

At its core, the training of AI systems is inherently problematic in that it can produce only what it is taught, and that is primarily data scraped from the internet. This data includes misinformation, content lacking empirical support, and draws from a supply of content produced by a non-random, non-representative sample of creators (Melchionna, 2023). AI’s inability to discern truth from myth or to fact-check using empirical sources can contribute to the production of inaccurate and misleading output. Compounding this problem is that the existing data used to train AI systems is likely to overrepresent content generated by Western men, particularly white men. We know image generative AI continues to produce output that reinforces Whiteness and American culture as standards (Bianchi et al., 2023), and without supporting textual explanation or discussion to supplement these images, reflects gender and racial disparities in its images. Ultimately, it is important to recognize that AI systems have limitations in gauging the quality of the data it trains on and that the scope of these data inherently excludes many diverse standards and perspectives (Wellner & Rothman, 2020). 

The Algorithms Problem

Algorithms refer to rules or formulas employed to make calculations or solve problems. These sets of rules or formulas, perhaps due to their scientific and/or mathematical utilities, often connote a sense of neutrality or correctness to people. The high regard granted to computer output is also seemingly exponentially higher when attached to the idea of algorithms (Airoldi, 2021). In the world of LLMs, the terminology of artificial “intelligence” is likely to reinforce this sense of meaning (see Joyce et al., 2021). However, the authors (and editors) of these rules and formulas in the world of AI bring their own biases and worldviews to the equations that propel systems like ChatGPT. These biases, while sometimes imperceptible to the individual, are not insignificant. Our cultural and subcultural standards, including ideas of gender, race, religion, and “simple” matters of right and wrong, are so deeply embedded in our worldviews they promote blind spots to others ways of life and seeing the world. Similarly, and related to “the training problem,” is that training data and resulting algorithms do not necessarily accurately reflect the world. Indeed, there are significant data gaps that promote biases related to gender, race, and class (Smith & Rustagi, 2020). The algorithms may be created and calculated in ways that provide greater weight to some aspects or data points that serve to reinforce biases in the original data (Ferrara, 2023). This has the effect of AI systems “baking in and deploying biases at scale” (Manyika, Silberg, & Presten, 2019).

An additional way AI can introduce bias is through the process of generalization. In these instances, the system may apply its understanding of the data on which it was “trained” to new, unfamiliar inputs (Ferrara, 2023). Essentially, it would generalize its understanding to new data, though it may not be useful or correct to do so. AI may not always (or consistently) account for new patterns or have sufficient capacity to generate the most contextually relevant output. In other words, it does not break the mold it was trained to look for or apply (Melchionna 2023). Amplifying this limitation and concern is the data and the algorithms themselves are not generally available or decipherable to the typical AI user (Mittelstadt, Allo, Taddeo, Wachter, & Floridi, 2016). In turn, these algorithms do lend themselves to public scrutiny or critique and can promote echo chambers of sorts. The combination of the echo chamber effect with the scope of training data lacking diverse cultural standards supports concerns by scholars of digital culture that LLMs like ChatGPT are essentially monocultural

The Human Intervention Problem

The design of ChatGPT includes steps to mitigate bias and misinformation in its text production. In fact, when asked directly, ChatGPT will acknowledge this. However, these steps include providing guidelines for human reviewers and having a “feedback” loop so these reviewers can continue to improve the model’s performance over time as they learn more about the output it generates. These types of efforts remain vulnerable to the biases of the human reviewers and cannot eliminate bias and misinformation (Arvanitis, Sadeghi, & Brewster, 2023). Again, AI can only do what it is trained to do, and neither the data or the trainers are inherently neutral.  

The training of AI systems is also delegated to its users through a process known as “Reinforcement Learning with Human Feedback” (RLHF). Here, users of the system get to rate the output as good (thumbs up) or bad (thumbs down). This feedback is gathered and used collectively to give direction to the next iteration of the LLM. This process reflects again exactly how biases may be built into the system’s design and output. Output assessed to be good by the critical majority of users is more likely to remain while output assessed by a minority of users may get eliminated or adjusted. Given the demographic imbalances known to exist in societies, likely compounded by the digital divide (Lutz, 2019), this pathway to refine the LLM is likely to reinforce preferences of the majority and suppress those of minorities. Ultimately, the biases of AI will continue to reflect the biases of its users. 

In 2022, researchers reported the popular image generative AI system, DALL.E, worked to avoid evidence of biased output (non-representative images lacking racial/ethnic diversity) by altering the coding of user prompts to intentionally produce images of non-White people (Ilube, 2022). This is similar to the current operation of ChatGPT, which it flags and avoids what it has been taught to identify as inappropriate conversations. To be sure, this modification reflects a goal of avoiding offensive content, but such a modification cannot be made without first introducing some group’s conceptualization of what is normative and acceptable versus what is deviant and unacceptable. Indeed, no standard employed by these systems using “big data” is capable of adequately representing all, or even most, groups. 

Consequences of AI Bias

Due to the multiple problems discussed above, multiple types of bias can be present in the text generative AI produces. This can include demographic bias, in which some groups (e.g., race-ethnic, age, or gender groups) are treated in a discriminatory manner, or cultural bias, in which the text AI produces reflects past and/or current stereotypes and prejudices (Ferrara, 2023). This latter type of bias may undermine current efforts to reject and move away from existing bias by reintroducing and reinforcing it. Generative AI may also evidence linguistic bias (i.e., preferencing ideas from English text), ideological bias, and political bias (Ferrara, 2023). The impact of these biases can be quite significant, even (and perhaps especially) when they are difficult to locate and identify. Prioritization of perspectives and information held and created by privileged and wealthier populations will only serve to reify existing hierarchies (Smith & Rustagi, 2020). 

For instance, the problem of biased algorithms resulting in gender bias is evident in Amazon’s attempt to use AI to review job applicant resumes (Dastin, 2018). Because AI was trained using existing successful resumes, and because male employees dominated the tech industry, the system ultimately “learned” or embedded this inequitable distribution as part of its calculations. While this bias was eventually noticed and the algorithm discontinued, it highlights just one way in which the patterns AI learns from (and which human reviewers of algorithms) can unintentionally (re)produce bias in AI output. 

Similarly, there is early evidence AI echoes the less humanized news coverage on Black homicide victims. Here, too, when asked directly, ChatGPT will offer examples of ways in which its output can give misleading impressions of people based on their race. For instance, in describing the Gold Rush in early American history, ChatGPT’s training data may lead it to suggest race-based inferiority and undermine the role of Asian immigrants in that era. The output may explicitly state Asian immigrants mostly worked low-skilled jobs and (therefore) didn’t play a significant role in the development of the region. When this information emerges from AI, which students are likely to consider an authoritative and impartial source of perspectives and information (Airoldi, 2021), they are unlikely to question the accuracy and/or consider more nuanced understandings of the output. The lack of critical reflection will serve to perpetuate biases and stereotypes, and it may be especially difficult to be more inclusive and more equitably disseminate diverse worldviews and represent diverse cultures. 

Conclusion

Overall, it is important for users of AI to realize and appreciate that the processes by which computers generate their output, including algorithms, are designed by humans. The act of deciding the relative importance of information for LLMs is inherently subjective (Mittelstadt et al., 2016). The designs they implement into computer technologies are also likely to reproduce and mirror existing misconceptions and biases (Wellner & Rothman, 2020).  In this way, the output AI systems generate are ultimately reproducing and reinforcing biases and misconceptions. They also are likely to under-represent variation of cultural standards/values and/or overgeneralize the standards/values reflecting the creators of AI’s training data and algorithms. For these reasons, the promotion of AI literacy among educators and students is paramount. 

In sum, a necessary step to become AI literate is to understand how the design and implementation of these systems are imperfect so that educators and students alike move past the misconception that they are independent and impartial sources of information. This better equips teachers to design their coursework to utilize AI, if they choose, to achieve specific learning outcomes for their curriculum. Moreover, AI literacy, particularly in terms of understanding its potential for bias, works directly toward the goals of a liberal education in helping students to become more critical consumers of information and more culturally competent.

Works Cited

Airoldi, M. (2021). Machine habitus: Toward a sociology of algorithms. Cambridge: Polity Press.

Arvanitis, L., Sadeghi, M., & Brewster, J. (2023, March). Despite OpenAI’s promises, the company’s new AI tool produces misinformation more frequently, and more persuasively, than its predecessor. NewsGuard Misinformation Monitor. https://www.newsguardtech.com/misinformation-monitor/march-2023/

Bianchi, F., Kalluri, P., Durmus, E., Ladhak, F., Cheng, M., Nozza, D., ... & Caliskan, A. (2023, June). Easily accessible text-to-image generation amplifies demographic stereotypes at large scale. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (pp. 1493-1504).

Dastin, J. (2018, October 10). Amazon scraps secret AI recruiting tool that showed bias against women. Reuters

Ferrara, E. (2023). Should ChatGPT be biased? challenges and risks of bias in large language models. arXiv preprint arXiv:2304.03738. https://doi.org/10.48550/arXiv.2304.03738

Ilube, C. (2022, December ). The hidden biases behind ChatGPT. 

Joyce, K., Smith-Doerr, L., Alegria, S., Bell, S., Cruz, T., Hoffman, S. G., Noble, S. U., & Shestakofsky, B. (2021). Toward a sociology of artificial intelligence: A call for research on inequalities and structural change. Socius, 7. https://doi.org/10.1177/2378023121999581

Liu, Z. (2021). Sociological perspectives on artificial intelligence: A typological reading. Sociology Compass, 15, e12851.

Lutz, C. (2019). Digital inequalities in the age of artificial intelligence and big data. Human Behavior and Emerging Technologies, 1, 141-148.

Manyika, J., Silberg, J., & Presten, B. (2019, October 25). What do we do about the biases in AI? Harvard Business Review, https://hbr.org/2019/10/what-do-we-do-about-the-biases-in-ai

Melchionna, L. C. M. (2023). Bias and fairness in artificial intelligence. New York State Bar Association. 

Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., & Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2). https://doi.org/10.1177/2053951716679679

Smith, G., & Rustagi, I. (2020). Mitigating Bias in Artificial Intelligence: An Equity Fluent Leadership Playbook. Berkley Haas Center for Equity, Gender and Leadership. https://haas.berkeley.edu/wp-content/uploads/UCB_Playbook_R10_V2_spreads2.pdf

Wellner, G., & Rothman, T. (2020). Feminist AI: Can we expect our AI systems to become feminist? Philosophy & Technology, 33, 191-205.

Howe Center for Writing Excellence

The mission of the HCWE is to ensure that Miami supports its students in developing as effective writers in college, and fully prepares all of its graduates to excel as clear, concise, and persuasive writers in their careers, communities, and personal lives.

2022 Writing Program Certificate of Excellence

logo-the-conference-on-college-composition--communication.png

2022 Exemplary Enduring WAC Program

2022 Exemplary Enduring WAC Program Award